krpoltext provides convenient R access to two large-scale Korean political text corpora described in:
Lim, T.H. (2025). South Korean Election Campaign Booklet and Party Statements Corpora. Scientific Data, 12, 1030. https://doi.org/10.1038/s41597-025-05220-4
| Corpus | Period | Candidates / Entries | Description |
|---|---|---|---|
| Election Campaign Booklets | 2000–2022 | 49,678 candidates | Manifesto booklets from candidates in presidential, National Assembly, and local elections |
| Party Statements | 2003–2022 | 82,723 statements | Official statements and leadership meeting minutes from the two major parties |
Data is hosted on the Open Science Framework (DOI: 10.17605/OSF.IO/RCT9Y) and automatically downloaded on first use.
Installation
# install.packages("remotes")
remotes::install_github("taehyun-lim/krpoltext")Quick Start
library(krpoltext)
# Load a dataset (auto-downloads from OSF on first use, then cached)
ps <- load_party_statements()
cb <- load_campaign_booklet()
# Explore metadata
metadata("party_statements")
# Filter documents
docs_2020 <- get_docs("party_statements", year = 2020)
conservative <- get_docs("party_statements", year = 2018:2022, conservative = 1)
# Campaign booklets: filter by office and party
assembly <- get_docs("campaign_booklet", office = "national_assembly", .data = cb)Data Download
The CSV files (~1.5 GB total) are downloaded from OSF on first load:
# Auto-download: asks for consent interactively
ps <- load_party_statements()
# Or download all datasets at once
download_data()
# Provide a local file path instead
ps <- load_party_statements(path = "~/Downloads/sk_party_statements_v2022.csv")Data is cached as compressed RDS in tools::R_user_dir("krpoltext", "cache") and verified via SHA-256 checksums. Subsequent loads take ~2 seconds.
Integration with quanteda
library(quanteda)
corp <- as_quanteda_corpus(ps, docid_field = "id")
toks <- tokens(corp, remove_punct = TRUE)
dfm_obj <- dfm(toks)
topfeatures(dfm_obj, 20)Functions
| Function | Description |
|---|---|
load_campaign_booklet() |
Load the campaign booklet corpus |
load_party_statements() |
Load the party statements corpus |
metadata() |
Dataset metadata (columns, counts, citation) |
get_docs() |
Filter documents by any column |
as_quanteda_corpus() |
Convert to a quanteda corpus object |
download_data() |
Download datasets from OSF |
clear_cache() |
Remove cached data files |
Citation
If you use this data in academic work, please cite the Data Descriptor paper:
Lim, T.H. (2025). South Korean Election Campaign Booklet and Party Statements Corpora. Scientific Data, 12, 1030. https://doi.org/10.1038/s41597-025-05220-4
And the data repository:
Lim, T.H. (2024). South Korean Election Campaign Booklet Corpus and Party Statements Corpus. OSF. https://doi.org/10.17605/OSF.IO/RCT9Y
citation("krpoltext")License
- Package code: MIT License
- Data: CC BY-NC-ND 4.0 — see the OSF project and the Data Descriptor for full terms.
Links
- Data Descriptor: https://doi.org/10.1038/s41597-025-05220-4
- OSF Repository: https://osf.io/rct9y/
- GitHub: https://github.com/taehyun-lim/krpoltext
- Issues: https://github.com/taehyun-lim/krpoltext/issues