Skip to contents

A small synthetic survey dataset used throughout the package as a runnable example. It backs the function examples, serves as a teaching dataset for new users, and demonstrates cross-platform save and load behavior. The 100 respondents and 15 variables are chosen to exercise the kinds of data social-science users actually have: Likert scales, dichotomies, a multi-category variable, continuous measures, and SPSS-style user-defined missing values. The data are synthetic, but the relationships among the variables are realistic.

Usage

community

Format

A data frame with 100 rows and 15 variables:

RespondentID

Respondent ID, character ("R001", "R002", ...).

Income

Annual income (USD). Carries SPSS-style missing values (-99 Refused, -98 Don't know).

Education

Highest education level, 5 categories (1 Some high school, 2 High school graduate, 3 Some college, 4 Bachelor's degree, 5 Graduate degree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).

Age

Age in years (integer, 18-80).

WellbeingScore

Flourishing score (integer, 0-100); built with an Income-by-Age interaction.

Volunteer

Volunteered in past year, dichotomy coded 0/1 (0 No, 1 Yes).

OwnsHome

Owns home, dichotomy coded 1/2 (1 Yes, 2 No); recode to 0/1 before use as a logistic-regression outcome.

Smoker

Current smoker, dichotomy coded 0/1 (0 No, 1 Yes). Carries an SPSS-style missing value (-99 Refused).

CommuteTime

Daily commute time in minutes (integer); deliberately near-independent of the other variables.

Region

Region of residence, 4 categories (1 North, 2 South, 3 East, 4 West).

Environment1

"Climate change is a serious threat." 5-point Likert (1 Strongly Disagree to 5 Strongly Agree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).

Environment2

"Concern about the environment is exaggerated." 5-point Likert; reverse-keyed (the variable label ends in " R"). Reverse-code before scale scoring.

Environment3

"Government should do more for the environment." 5-point Likert. Carries SPSS-style missing values (-99 Refused, -98 Don't know).

Environment4

"I would pay more for environmentally friendly products." 5-point Likert.

Environment5

"Pollution is a major cause of public health problems." 5-point Likert; weakly loaded (a Cronbach's-alpha drop candidate when scoring the scale).

Source

Synthetic data generated by data-raw/community_data_generator.R (random seed 20260605).

Details

community is the clean default example dataset. For a companion dataset that deliberately carries undeclared missing-value codes, stripped value labels, and an imperfect scale – the material the data-cleaning workflow operates on – see clinic.

The five Environment items form a single attitude scale: item 2 is reverse-keyed, and item 5 is the weak item that scale-reliability output flags for dropping. Income, Education, Smoker, Environment1, and Environment3 carry SPSS-style missing values, with the codes placed on partly non-overlapping cases so that listwise deletion visibly reduces the analysis sample below the per-variable counts. All of community's missing-value codes are properly declared; for a dataset with undeclared codes and other deliberate data-cleaning problems, see clinic.

See also

clinic, the messy-data companion dataset.