Filters rows from either the campaign booklet or party statements corpus based on column values. Only filters on columns that actually exist in the chosen dataset; unknown column names are silently ignored with a message.
Usage
get_docs(
dataset = c("campaign_booklet", "party_statements"),
...,
.data = NULL
)Arguments
- dataset
Character; one of
"campaign_booklet"or"party_statements".- ...
Named filtering arguments. Each name must correspond to a column in the dataset. Values can be:
A single value: exact match (e.g.,
year = 2020)A vector: match any (e.g.,
year = 2018:2022)For date/year ranges, supply a numeric or character vector.
- .data
Optional; a pre-loaded
data.tableto filter instead of loading from disk. Useful to avoid repeated I/O.
Details
The function dynamically checks which columns exist in the data and applies filters only for matching column names. If a filter name does not match any column, it is ignored and a message is printed.
For the campaign_booklet dataset, useful filter columns include:
party, party_eng, office, region, district, date, sex,
result, name.
For the party_statements dataset, useful filter columns include:
year, partisan, conservative.
Examples
if (FALSE) { # \dontrun{
# Get all party statements from 2020
docs_2020 <- get_docs("party_statements", year = 2020)
# Get campaign booklets for a specific party
saenuri <- get_docs("campaign_booklet", party_eng = "Saenuri Party")
# Combine multiple filters
subset <- get_docs("party_statements", year = 2018:2022, conservative = 1)
# Filter a pre-loaded dataset
ps <- load_party_statements()
docs <- get_docs("party_statements", year = 2020, .data = ps)
} # }