Recode a variable with explicit value mapping and optional labels
jrecode.Rdjrecode() recodes a variable using a simple map string that specifies
how old values should be converted to new values. It is designed for
situations where you need to collapse categories, change numeric codes,
or recode dichotomies. Variable and value labels are handled automatically.
Map and labels rules can also produce missing values: plain system NA
via the NA / System / SYSMIS aliases, or
Stata-style tagged missing values (.a through .z) when
the active convention is Stata. See Missing values in the map
below for the canonical patterns under each convention.
Arguments
- data
A data frame containing the original variable.
- orig.var
The variable to recode (unquoted, e.g.
AgeGroup).- map
A quoted string specifying the recode rules, using the format
"old=new"with rules separated by semicolons. Multiple old values mapping to the same new value are separated by commas on the left side.An optional
elseclause controls what happens to values not covered by the map:No else clause: the function stops with a message if any values are left unmapped, so you can fix the map before proceeding.
else=NA(alsoelse=Systemorelse=SYSMIS): unmapped values are deliberately set to system NA.else=copy: unmapped values are carried across unchanged.else=.a(or any Stata-style missing-value token, Stata convention only): unmapped values are set to that Stata-style missing value.
Individual values can also be mapped to system NA using the same aliases:
"-5=NA","-5=System", or"-5=SYSMIS".Under Stata convention, values can be mapped to Stata-style missing-value tokens:
"-99=.a; -98=.b".Examples:
"1=1; 2=0""1=1; 2,3=2; 4,5=3; else=NA""1=1; 2=0; else=copy""-5=System; else=copy""3=1; 4=2; else=.a"(Stata convention only)
- labels
Optional. A quoted string specifying value labels for the new variable, using the format
"code=Label Text"with rules separated by semicolons. If supplied, these labels are used as-is.The left side of each rule may be a numeric code or, under Stata convention, a Stata-style missing-value token (
.athrough.z). Tagged-NA labels are stored on the tag itself, not on a numeric code.If omitted, the function attempts to transfer value labels automatically from the original variable. This works when the original variable has value labels and the mapping is one-to-one (no categories are collapsed). When categories are collapsed, labels cannot be transferred automatically and a note is printed.
Example:
"1=Male; 0=Female"or".a=Refused; .b=Don't know".- convention
Optional. One of
"spss","stata", orNULL(default). Controls whether Stata-style missing-value tokens (.athrough.z) are accepted in the map and labels arguments. Inert when no Stata-style missing-value tokens appear in either argument.When
NULL, the convention is resolved fromjoptions("missing.convention"); if that is also unset, the default is SPSS. Most users set the convention once at the top of a session viajoptions()(or in their.Rprofile) rather than supplying this argument on every call. See?joptionsfor details.
Value
A haven_labelled vector with the recoded values, variable
label, and (if supplied or auto-transferred) value labels applied. Assign
this to a new column in your data frame:
MyData$AgeGroupR <- jrecode(MyData, AgeGroup, map = "...")
Details
The function accepts haven-labelled, plain numeric, and factor variables.
The variable label from the original variable is carried across automatically with "(recoded)" appended. If the original variable has no variable label, the variable name is used instead.
Value labels are handled in three ways, in order of priority:
If
labelsis supplied, those labels are used as-is.If
labelsis omitted and the original variable has value labels, they are automatically transferred to the new codes — provided the mapping is one-to-one (no collapsing). For example, recoding 1/2 to 1/0 will carry "Yes" and "No" across to the new codes automatically.If categories are collapsed (multiple old values map to one new value), automatic transfer is not possible and a note is printed directing you to supply labels manually.
NA values in the original variable are always set to NA in the new variable,
regardless of the else setting.
Values that appear to be coded missing values (e.g. -99, -9, 999) from SPSS
or another package are automatically detected and set to NA, even when
else=copy is used. A note is printed when this occurs.
If the map does not include an else clause and there are unmapped
values in the variable, the function stops with a message listing the
unmapped values so you can fix the map before proceeding.
If the map specifies values that do not exist in the original variable, a warning is issued (but the function continues). This helps catch typos in the map string.
Missing values in the map. The package supports two conventions
for representing user-defined missing values (UDMs), and the syntax for
producing UDMs from jrecode() depends on which one is active:
Under SPSS convention (the default), UDMs are real numeric codes carrying metadata that flags them as missing. The two-step canonical pattern is:
df$EducR <- jrecode(df, Education,
map = "1,2=1; 3=2; 4,5=3; -99,-98=-99",
labels = "1=High school or less; 2=Some college; 3=Degree")
df <- jdeclare_udm(df, EducR, codes = c(Refused = -99))The jrecode() call assigns the numeric sentinel -99; the
subsequent jdeclare_udm() call attaches the label and flags
-99 as missing. Labeling -99 inside the labels
argument is unnecessary — jdeclare_udm() owns that label.
Under Stata convention, UDMs are typed missing cells marked
with Stata-style tags (.a through .z). The single-call
canonical pattern is:
df$EducR <- jrecode(df, Education,
map = "1,2=1; 3=2; 4,5=3; else=.a",
labels = "1=High school or less; 2=Some college; 3=Degree; .a=Refused")Under Stata convention, jdeclare_udm() is not needed for this
pattern — jrecode() handles both the value recoding and the
Stata-style missing-value labeling in one call.
Writing Stata-style missing-value tokens while the active convention is SPSS raises an
informative error that echoes the user's call rewritten in SPSS-style
syntax. Switching the convention session-wide is one line:
joptions(missing.convention = "stata").
See also
jdeclare_udm for declaring user-defined missing
values on a column after a recode (the SPSS-style canonical pattern).
jrelabel for applying labels to an existing variable
after a recode.
joptions for the session-level
missing.convention setting.
jstats for the package overview,
workflow conventions, and complete function listing.
Examples
# Recode with explicit labels (a 1/2 dichotomy to 0/1)
df <- community
df$OwnsHome01 <- jrecode(df, OwnsHome,
map = "1=1; 2=0",
labels = "0=No; 1=Yes")
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Collapse categories (must supply labels)
df$RegionR <- jrecode(df, Region,
map = "1,2=1; 3,4=2",
labels = "1=North or South; 2=East or West")
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Use else=copy to carry unspecified values across unchanged
df$EducR <- jrecode(df, Education,
map = "5=4; else=copy",
labels = "4=Bachelor's degree or higher")
#> Note: -99 ("Refused"), -98 ("Don't know") are declared missing values and were kept on the recoded variable.
#> To convert them to plain NA instead, map them to NA (for example -99=NA).
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Use else=NA to deliberately drop unspecified values to system NA
df$EducR2 <- jrecode(df, Education,
map = "4=1; 5=1; else=NA",
labels = "1=College degree")
#> Note: -99 ("Refused"), -98 ("Don't know") are declared missing values and were kept on the recoded variable.
#> To convert them to plain NA instead, map them to NA (for example -99=NA).
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Convert a specific coded missing value to system NA
df$EducR3 <- jrecode(df, Education, map = "-99=System; else=copy")
#> Note: -98 ("Don't know") is a declared missing value and was kept on the recoded variable.
#> To convert it to a plain NA instead, add -98=NA to the map.
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Stata convention: Stata-style missing-value tokens in map and labels
# (single call; convention = "stata" scopes the choice to this call only)
df$EducR4 <- jrecode(df, Education,
map = "1,2=1; 3,4,5=2; else=.a",
labels = "1=No college; 2=College; .a=Refused",
convention = "stata")
#> Note: -99 ("Refused"), -98 ("Don't know") are declared missing values and were kept on the recoded variable.
#> To convert them to plain NA instead, map them to NA (for example -99=NA).
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.
# Using juse() default
juse(df)
#> Default data frame set to: df
df$RegionR2 <- jrecode(Region, map = "1,2=1; 3,4=2",
labels = "1=North or South; 2=East or West")
#>
#> Note: jrecode() returns the recoded values; assign them to a column to keep them:
#> df$<name> <- jrecode(...)
#> To check the recode landed correctly, compare jfreq() on the original and the new column.