SPSS-like linear regression output with standardized coefficients

Fits a linear model using stats::lm() and prints SPSS-style output, including unstandardized coefficients, standard errors, t values, p values, and standardized coefficients (beta). Standardized coefficients are left blank for the intercept and for dummy-coded categorical terms.

Usage

jlm(
  formula,
  data,
  subset = NULL,
  variable.id = NULL,
  numeric = NULL,
  categorical = NULL,
  count = NULL,
  ci = NULL,
  std = "regular",
  diagnostics = NULL,
  ref.categories = NULL,
  full = FALSE,
  case.processing.detail = NULL,
  digits = NULL,
  ...,
  value.id = NULL
)

Arguments

formula: A model formula, e.g. y ~ x1 + x2.
data: A data frame containing variables referenced in formula.
subset: An optional unquoted logical expression (e.g. Group == 1) to subset cases for this call only. Applied after jcomplete and jsubset. Does not affect other function calls.
variable.id: Character or NULL. Variable label display mode: one of "both", "names", "labels", "legend", or "legend.bottom". "names" shows variable names only; "both" shows "name: label"; "labels" replaces each coefficient's variable name with its label in the Coefficients table (factor level decoration is preserved) – best for short labels; "legend" prints a label legend between the Coefficients table and the R-squared/fit block; "legend.bottom" prints it at the very end. NULL (default) defers to joutput()'s variable.id setting. Not a logical.
numeric: Optional character vector of variable names that should be treated as continuous (numeric) even if they have value labels. For example, numeric = "Age" or numeric = c("Age", "Education").
categorical: Optional character vector of variable names that should be treated as categorical even if they lack value labels. For example, categorical = "Program" or categorical = c("Program", "Region"). The first sorted unique value becomes the reference category. Use jdummy() for control over the reference category.
count: Optional character vector of variable names to treat as counts for this call (the per-call counterpart of jcount()). On the dependent variable it speaks the count-regression caveat definitively rather than as a hedge, and applies even when the variable sits outside the structural 0-6 band. On an independent variable it behaves like numeric (a count predictor enters the model as numeric). A variable cannot be listed in both count and categorical.
ci: Logical or NULL. If TRUE, appends a 95% confidence interval for each unstandardized coefficient (b) at the right of the coefficient table. If NULL (default), defers to joutput()'s regression.ci setting (off at minimal and standard, on at full). Computed as the closed form b +/- t(.975, residual df) * SE.
std: Character. Controls the standardized-coefficient column. One of "regular" (default) – standardized betas with the prevalence-scaled betas of dummy and dichotomous predictors suppressed, since a fully standardized beta on a 0/1 indicator is not comparable to the continuous betas; "all" – the same standardized betas with nothing suppressed; "gelman" – Gelman (2008) scaling, where continuous predictors are placed on a divide-by-two-standard-deviations scale and binary predictors keep their raw 0/1 contrast (shown for all predictors, and headed "Gelman beta"); or "none" – omit the column. The returned object always carries both the full regular betas (beta) and the full Gelman betas (beta_gelman) regardless of this display choice.
diagnostics: Logical, character vector, or NULL. If TRUE, prints VIF table and diagnostic plots. If a character vector, specifies which diagnostics to show: vif, residuals, qq, scale, cooks, leverage. If NULL (default), defers to joutput() session setting.
ref.categories: Logical or NULL. Per-call override for showing the reference-categories block (the baseline level dropped from each set of dummy variables). NULL (default) defers to joutput()'s ref.categories setting. Applies to jlm() and jlogistic() only, since they are the functions that produce dummy-coded coefficient tables.
full: Logical. If TRUE, turns on the coefficient confidence interval and diagnostics. Does not override explicit FALSE values.
case.processing.detail: Per-call override of the Case Processing Summary detail tier: one of "none", "totals", or "per_code". NULL (default) uses the active joutput() level default.
digits: Integer or NULL. Number of decimal places for continuous statistics in the output tables (range 0-7; digits = 0 prints whole numbers with no trailing decimal point). Does not affect p-values, percentages, or integer quantities (counts, N, degrees of freedom), which keep their own fixed conventions. NULL (default) defers to joutput()'s digits setting (default 3).
...: Reserved for argument-name checking. Passing which, plots, or show will produce a helpful error suggesting diagnostics instead.
value.id: Character or NULL. Value-label display mode for the dummy category rows in the Coefficients table: one of "both" ("code: label", degrading to a bare code where a code has no label), "values" (the bare code), or "labels" (the value label, degrading to the bare code where none exists). The reference category folded into each grouped variable's header follows the same mode. "legend" and "legend.bottom" are not supported here: a coefficient table already pairs each value label with its row, so a separate legend block would only duplicate it. Passing either explicitly is an error; a joutput() default of "legend" or "legend.bottom" is tolerated and rendered as "both", so it does not break a bare call. Variables with no value labels render identically under all supported modes. NULL (default) defers to joutput()'s value.id setting. Applies only to multi-category dummy predictors; continuous and single-contrast (dichotomous) predictors are unaffected. Not a logical.

Value

Invisibly returns a list of class jst_lm containing:

model: The fitted lm object.
model_type: Character string linear.
model_frame: The model frame used to fit the model.
formula_used: The formula after dummy expansion.
coefficients: Formatted coefficient table (data frame); includes 95% CI Lower / Upper columns when ci is on.
coefficients_raw: Flat data frame of raw, full-precision coefficient statistics (one row per coefficient): term (machine key), b, SE, t, df, p, beta, and ci_lower / ci_upper bounds (present regardless of the ci display toggle). Carries beta_standardization and outcome attributes.
fit_raw: List of raw, full-precision fit statistics (R-squared, adjusted R-squared, residual SE, F with its dfs and p-value, residual df, and N).
r_squared: R-squared value.
adj_r_squared: Adjusted R-squared value.
residual_se: Residual standard error.
f_statistic: Named numeric vector with F value, df1, df2, and p.
sums_of_squares: Named numeric vector (regression, residual, total).
n: Number of observations used in the model.
dummy_coef_names: Names of dummy variable columns created by jdummy() registrations.
ref_cats: Reference category descriptions for all categorical variables in the model.
vif: Named numeric vector of VIF values, or NULL for bivariate.
sample_info: Pipeline and missing data counts.

Details

Also prints key model summary information (R-squared, adjusted R-squared, residual standard error, F-test, sums of squares, and N). If any coefficients are dropped due to perfect collinearity, a warning message is printed.

A red "Linear Regression" title is printed first, followed by variable labels (if present), then the coefficient table and model fit statistics.

Handling of variables:

Variables registered with jdummy() are expanded into dummy variables using the registered reference category.
Unregistered haven-labelled variables with value labels are automatically treated as categorical (converted to factors). The first category is used as the reference, and an informational message suggests using jdummy() for control over the reference category.
Haven-labelled variables without value labels are treated as continuous (converted to numeric).
The numeric argument overrides auto-detection for variables that have value labels but should be treated as continuous (e.g. Age with labels like "18 years", "19 years").
The categorical argument forces variables without value labels (or plain numeric variables) to be treated as categorical (e.g. a numeric Program variable coded 1–4 from a CSV file).
The dependent variable is always modelled as numeric. Naming it in numeric or count does not change that; it only asserts the DV's role so the count / categorical-like note is silenced (numeric) or stated definitively (count).

Examples