Community survey example dataset
community.RdA small synthetic survey dataset used throughout the package as a runnable example. It backs the function examples, serves as a teaching dataset for new users, and demonstrates cross-platform save and load behavior. The 100 respondents and 15 variables are chosen to exercise the kinds of data social-science users actually have: Likert scales, dichotomies, a multi-category variable, continuous measures, and SPSS-style user-defined missing values. The data are synthetic, but the relationships among the variables are realistic.
Format
A data frame with 100 rows and 15 variables:
- RespondentID
Respondent ID, character ("R001", "R002", ...).
- Income
Annual income (USD). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
- Education
Highest education level, 5 categories (1 Some high school, 2 High school graduate, 3 Some college, 4 Bachelor's degree, 5 Graduate degree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
- Age
Age in years (integer, 18-80).
- WellbeingScore
Flourishing score (integer, 0-100); built with an Income-by-Age interaction.
- Volunteer
Volunteered in past year, dichotomy coded 0/1 (0 No, 1 Yes).
- OwnsHome
Owns home, dichotomy coded 1/2 (1 Yes, 2 No); recode to 0/1 before use as a logistic-regression outcome.
- Smoker
Current smoker, dichotomy coded 0/1 (0 No, 1 Yes). Carries an SPSS-style missing value (-99 Refused).
- CommuteTime
Daily commute time in minutes (integer); deliberately near-independent of the other variables.
- Region
Region of residence, 4 categories (1 North, 2 South, 3 East, 4 West).
- Environment1
"Climate change is a serious threat." 5-point Likert (1 Strongly Disagree to 5 Strongly Agree). Carries SPSS-style missing values (-99 Refused, -98 Don't know).
- Environment2
"Concern about the environment is exaggerated." 5-point Likert; reverse-keyed (the variable label ends in " R"). Reverse-code before scale scoring.
- Environment3
"Government should do more for the environment." 5-point Likert. Carries SPSS-style missing values (-99 Refused, -98 Don't know).
- Environment4
"I would pay more for environmentally friendly products." 5-point Likert.
- Environment5
"Pollution is a major cause of public health problems." 5-point Likert; weakly loaded (a Cronbach's-alpha drop candidate when scoring the scale).
Details
community is the clean default example dataset. For a companion
dataset that deliberately carries undeclared missing-value codes, stripped
value labels, and an imperfect scale – the material the data-cleaning
workflow operates on – see clinic.
The five Environment items form a single attitude scale: item 2 is
reverse-keyed, and item 5 is the weak item that scale-reliability output
flags for dropping. Income, Education, Smoker, Environment1, and
Environment3 carry SPSS-style missing values, with the codes placed on
partly non-overlapping cases so that listwise deletion visibly reduces the
analysis sample below the per-variable counts. All of community's
missing-value codes are properly declared; for a dataset with undeclared
codes and other deliberate data-cleaning problems, see clinic.
See also
clinic, the messy-data companion dataset.