The krpoltext package provides access to two Korean political text corpora described in:
Lim, T.H. (2025). South Korean Election Campaign Booklet and Party Statements Corpora. Scientific Data, 12, 1030. https://doi.org/10.1038/s41597-025-05220-4
Load a Dataset
If a managed artifact is needed, it is downloaded from OSF on first use in an interactive
session and then cached locally as RDS. In non-interactive sessions, use
a local file path or a pre-populated cache. The load_*()
helpers prefer managed Parquet artifacts by default; the examples below
make that explicit. campaign_booklet is available in two
public variants: original and enriched. The
default is original; use variant = "enriched"
when you need NEC linkage fields such as huboid,
sg_id, sg_typecode, and
link_status.
library(krpoltext)
# Load the party statements corpus from managed Parquet
ps <- load_party_statements(format = "parquet")
psExplore Metadata
meta <- metadata("party_statements")
meta$name
meta$time_coverage
meta$n_candidates_or_entries
meta$columnsFilter Documents
get_docs() dynamically filters on any column that exists
in the dataset.
Campaign Booklets
cb <- load_campaign_booklet(format = "parquet")
# Load the NEC-linked enriched variant when needed
cb_enriched <- load_campaign_booklet(format = "parquet", variant = "enriched")
cb_enriched[, c("code", "huboid", "sg_id", "sg_typecode", "link_status")]
# National Assembly candidates only
assembly <- get_docs("campaign_booklet", office = "national_assembly", .data = cb)
nrow(assembly)
# Filter by party
table(assembly$party_eng)Next Steps
- See
vignette("replication-pipeline")for a full text analysis workflow with quanteda. - See the Data Descriptor for variable definitions, technical validation, and example use cases.