| Type: | Package | 
| Title: | Semi-Supervised Model for Geographical Document Classification | 
| Version: | 0.9.2 | 
| Maintainer: | Kohei Watanabe <watanabe.kohei@gmail.com> | 
| Description: | Semissupervised model for geographical document classification (Watanabe 2018) <doi:10.1080/21670811.2017.1293487>. This package currently contains seed dictionaries in English, German, French, Spanish, Italian, Russian, Hebrew, Arabic, Turkish, Japanese and Chinese (Simplified and Traditional). | 
| License: | MIT + file LICENSE | 
| URL: | https://github.com/koheiw/newsmap | 
| BugReports: | https://github.com/koheiw/newsmap/issues | 
| LazyData: | TRUE | 
| Encoding: | UTF-8 | 
| Depends: | R (≥ 3.5), methods | 
| Imports: | utils, Matrix, quanteda (≥ 2.1), quanteda.textstats, stringi | 
| Suggests: | testthat | 
| Language: | en-GB | 
| RoxygenNote: | 7.3.2 | 
| NeedsCompilation: | no | 
| Packaged: | 2025-07-10 08:04:17 UTC; watan | 
| Author: | Kohei Watanabe [aut, cre, cph], Stefan Müller [aut], Dani Madrid-Morales [aut], Katerina Tertytchnaya [aut], Ke Cheng [aut], Chung-hong Chan [aut], Claude Grasland [aut], Giuseppe Carteny [aut], Elad Segev [aut], Dai Yamao [aut], Barbara Ellynes Zucchi Nobre Silva [aut], Lanabi la Lova [aut], Lungta Seki [aut] | 
| Repository: | CRAN | 
| Date/Publication: | 2025-07-10 12:50:12 UTC | 
Evaluate classification accuracy in precision and recall
Description
Evaluate classification accuracy in precision and recall
Usage
accuracy(x, y)
Arguments
x | 
 vector of predicted classes  | 
y | 
 vector of true classes  | 
Examples
class_pred <- c('US', 'GB', 'US', 'CN', 'JP', 'FR', 'CN') # prediction
class_true <- c('US', 'FR', 'US', 'CN', 'KP', 'EG', 'US') # true class
acc <- accuracy(class_pred, class_true)
print(acc)
summary(acc)
Compute average feature entropy (AFE)
Description
AFE computes randomness of occurrences features in labelled documents.
Usage
afe(x, y, smooth = 1)
Arguments
x | 
 a dfm for features  | 
y | 
 a dfm for labels  | 
smooth | 
 a numeric value for smoothing to include all the features  | 
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*.
Description
Coerce various objects to coefficients_textmodel
This is a helper function used in summary.textmodel_*.
Usage
as.coefficients_textmodel(x)
Arguments
x | 
 an object to be coerced  | 
Coerce various objects to statistics_textmodel
Description
This is a helper function used in summary.textmodel_*.
Usage
as.statistics_textmodel(x)
Arguments
x | 
 an object to be coerced  | 
Assign the summary.textmodel class to a list
Description
Assign the summary.textmodel class to a list
Usage
as.summary.textmodel(x)
Arguments
x | 
 a named list  | 
Extract coefficients for features
Description
Extract coefficients for features
Usage
## S3 method for class 'textmodel_newsmap'
coef(object, n = 10, select = NULL, ...)
## S3 method for class 'textmodel_newsmap'
coefficients(object, n = 10, select = NULL, ...)
Arguments
object | 
 a Newsmap model fitted by   | 
n | 
 the number of coefficients to extract.  | 
select | 
 returns the coefficients for the selected class; specify by the
names of rows in   | 
... | 
 not used.  | 
Seed geographical dictionary in Arabic
Description
Seed geographical dictionary in Arabic
Author(s)
Dai Yamao daiyamao@scs.kyushu-u.ac.jp
Seed geographical dictionary in German
Description
Seed geographical dictionary in German
Author(s)
Stefan Müller mullers@tcd.ie
Seed geographical dictionary in English
Description
Seed geographical dictionary in English
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Spanish
Description
Seed geographical dictionary in Spanish
Author(s)
Dani Madrid-Morales dani.madrid@my.cityu.edu.hk
Seed geographical dictionary in French
Description
Seed geographical dictionary in French
Author(s)
Claude Grasland claude.grasland@parisgeo.cnrs.fr
Seed geographical dictionary in Hebrew
Description
Seed geographical dictionary in Hebrew
Author(s)
Elad Segev eladseg@gmail.com
Seed geographical dictionary in Italian
Description
Seed geographical dictionary in Italian
Author(s)
Giuseppe Carteny giuseppe.carteny@unimi.it
Seed geographical dictionary in Japanese
Description
Seed geographical dictionary in Japanese
Author(s)
Kohei Watanabe watanabe.kohei@gmail.com
Seed geographical dictionary in Portuguese
Description
Seed geographical dictionary in Portuguese
Author(s)
Barbara Ellynes Zucchi Nobre Silva barbara@zucchi.science
Seed geographical dictionary in Russian
Description
Seed geographical dictionary in Russian
Author(s)
Katerina Tertytchnaya katerina.tertytchnaya@gmail.com
Lanabi la Lova l.lalova@lse.ac.uk
Seed geographical dictionary in Turkish
Description
Seed geographical dictionary in Turkish
Author(s)
Lungta Seki yahoo.co.jp0409@gmail.com
Seed geographical dictionary in Chinese (simplified)
Description
Seed geographical dictionary in Chinese (simplified)
Author(s)
Ke Cheng kecheng.ac@gmail.com
Seed geographical dictionary in Chinese (traditional)
Description
Seed geographical dictionary in Chinese (traditional)
Author(s)
Chung-hong Chan chainsawtiney@gmail.com
Prediction method for textmodel_newsmap
Description
Predict document class using trained a Newsmap model
Usage
## S3 method for class 'textmodel_newsmap'
predict(
  object,
  newdata = NULL,
  confidence = FALSE,
  rank = 1L,
  type = c("top", "all"),
  rescale = FALSE,
  min_conf = -Inf,
  min_n = 0L,
  ...
)
Arguments
object | 
 a fitted Newsmap textmodel.  | 
newdata | 
 dfm on which prediction should be made.  | 
confidence | 
 if   | 
rank | 
 rank of the class to be predicted. Only used when   | 
type | 
 if   | 
rescale | 
 if   | 
min_conf | 
 return   | 
min_n | 
 set the minimum number of polarity words in documents.  | 
... | 
 not used.  | 
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel.
Description
Print methods for textmodel features estimates
This is a helper function used in print.summary.textmodel.
Usage
## S3 method for class 'coefficients_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x | 
 a coefficients_textmodel object  | 
digits | 
 minimal number of significant digits, see
  | 
... | 
 additional arguments not used  | 
Implements print methods for textmodel_statistics
Description
Implements print methods for textmodel_statistics
Usage
## S3 method for class 'statistics_textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x | 
 a textmodel_wordscore_statistics object  | 
digits | 
 minimal number of significant digits, see
  | 
... | 
 further arguments passed to or from other methods  | 
print method for summary.textmodel
Description
print method for summary.textmodel
Usage
## S3 method for class 'summary.textmodel'
print(x, digits = max(3L, getOption("digits") - 3L), ...)
Arguments
x | 
 a   | 
digits | 
 minimal number of significant digits, see
  | 
... | 
 additional arguments not used  | 
Calculate micro and macro average measures of accuracy
Description
This function calculates micro-average precision (p) and recall (r) and
macro-average precision (P) and recall (R) based on a confusion matrix from
accuracy().
Usage
## S3 method for class 'textmodel_newsmap_accuracy'
summary(object, ...)
Arguments
object | 
 output of accuracy()  | 
... | 
 not used.  | 
Semi-supervised Bayesian multinomial model for geographical document classification
Description
Train a Newsmap model to predict geographical focus of documents with labels given by a dictionary.
Usage
textmodel_newsmap(
  x,
  y,
  label = c("all", "max"),
  smooth = 1,
  boolean = FALSE,
  drop_label = TRUE,
  verbose = quanteda_options("verbose"),
  entropy = c("none", "global", "local", "average"),
  ...
)
Arguments
x | 
 a dfm or fcm created by   | 
y | 
 a dfm or a sparse matrix that record class membership of the
documents. It can be created applying   | 
label | 
 if "max", uses only labels for the maximum value in each row of
  | 
smooth | 
 a value added to the frequency of words to smooth likelihood ratios.  | 
boolean | 
 if   | 
drop_label | 
 if   | 
verbose | 
 if   | 
entropy | 
 [experimental] the scheme to compute the entropy to
regularize likelihood ratios. The entropy of features are computed over
labels if   | 
... | 
 additional arguments passed to internal functions.  | 
Details
Newsmap learns association between words and classes as likelihood
ratios based on the features in x and the labels in y. The large
likelihood ratios tend to concentrate to a small number of features but the
entropy of their frequencies over labels or documents helps to disperse the
distribution.
References
Kohei Watanabe. 2018. "Newsmap: semi-supervised approach to geographical news classification." Digital Journalism 6(3): 294-309.
Examples
require(quanteda)
text_en <- c(text1 = "This is an article about Ireland.",
             text2 = "The South Korean prime minister was re-elected.")
toks_en <- tokens(text_en)
label_toks_en <- tokens_lookup(toks_en, data_dictionary_newsmap_en, levels = 3)
label_dfm_en <- dfm(label_toks_en)
feat_dfm_en <- dfm(toks_en, tolower = FALSE)
model_en <- textmodel_newsmap(feat_dfm_en, label_dfm_en)
predict(model_en)