jstats: Simplified Statistical Analysis Tools for Social Science

jstats simplifies R for users who need to do social science analyses without being required to become experienced computer programmers first. The package provides consistent syntax, sensible defaults, and protection from confusing base R behaviors, while staying close enough to base R conventions that users learn transferable skills rather than a private dialect. Output is styled after the best conventions from alternative applications such as SPSS, Stata, and SAS, and code syntax is designed to ease the transition from these alternative packages into R. While this package was originally built as teaching infrastructure for a university-level statistics course, it has now been expanded for the broader social science research community.

Audience

The long-term primary audience is the broader social science quantitative research community – criminologists, sociologists, political scientists, psychologists, public health researchers, and others who routinely work with Likert scales, categorical variables, dichotomies, Cronbach's alpha, dummy-coded regression, and haven-imported data from SPSS, Stata, or SAS.

During the current development phase the package is being tested actively by students and colleagues at Griffith University, plus a growing community of former students and collaborating instructors. Feedback from this group shapes ongoing refinements.

Functions by purpose

Descriptive analysis

jdesc – univariate descriptives (mean, median, SD, range, etc.) with optional grouping
jfreq – frequency tables for one or more variables
jcorr – Pearson or Spearman correlations with significance tests
jalpha – Cronbach's alpha and item-total statistics for scale reliability
jscreen – data screening for outliers, ranges, and skew

Group comparisons and modeling

jt – independent or paired t-test
jaov – one-way analysis of variance with optional post-hoc tests
jcrosstab – cross-tabulation with chi-square and effect-size options
jlm – linear regression
jlogistic – logistic regression

Variable construction

jrecode – recode values, with optional new value labels
jrelabel – apply or replace value labels and variable label
jsum – row-wise sum across variables, with min-valid handling
javg – row-wise mean across variables, with min-valid handling

Pipeline state management

juse – set the default data frame used implicitly by analysis functions
jsubset – activate a row-level case-selection expression applied to subsequent calls
jcomplete – activate listwise filtering on selected variables
jdummy – register categorical variables for dummy coding in regression
joutput – set session-level output verbosity (minimal / standard / full)

Data import and export

jload – load data from .rds, .sav, .dta, .sas7bdat, .xlsx, or .csv
jsave – save a data frame, with format inferred from the file extension

Visualisation

jplot – base histograms and bar plots for data, plus method dispatch on result objects from jt(), jlm(), etc.

For the full alphabetical listing of every exported function, run library(help = "jstats") or browse the package index.

Workflow conventions

The j-prefix. Every user-facing function starts with j, so the package's whole API can be discovered in RStudio by typing j and pressing Tab. Internal helpers begin with a dot or .jst_ and are not intended for direct use.

Formula vs data-first. Group-comparison and modeling functions follow the base R formula interface: jt(MathScore ~ Gender, data = SampleData). Descriptive and data-management functions take the data frame first, followed by unquoted variable names: jfreq(SampleData, Gender, Program). This matches the conventions of base R functions like aggregate() and cor().

The juse-first habit. A single juse(MyData) call at the start of a session sets a default data frame. Subsequent analysis calls can then omit the data argument: jfreq(Gender) works the same as jfreq(MyData, Gender). The default also scopes the pipeline-state functions, so jsubset(Age < 30) sets a filter on the current default without further specification.

Pipeline stages. jsubset(), jcomplete(), and jdummy() modify session state that subsequent analysis calls read automatically. State is explicit – calls can be inspected, inactivated, and cleared, and active state is reported in analysis output, so a script's behavior stays visible and reproducible rather than depending on hidden context.

Output verbosity. joutput() sets one of three preset levels – minimal, standard (default), or full – that modulate how much detail analysis functions print. Useful for stripping output in production scripts or expanding it during exploration. Per-call arguments always override session-level settings. The Case Processing Summary table follows an auto-suppress rule at the standard tier: it prints when something happened (pipeline state, listwise drops, or a per-variable discrepancy notification) and stays silent otherwise. See ?joutput for the full toggle behavior.

Where to go next

For the full alphabetical listing of functions: library(help = "jstats").
For source, issue reports, and contribution guidelines: the package's GitHub repository.
For statistics and R fundamentals (in preparation): Book 1 of the companion book series.
For migration patterns from SPSS, Stata, or SAS, and a deeper guide to the package's design and use in real research (in preparation): Book 2, the adopter's guide.

Author