metaphonebr

CRAN status Codecov test coverage check CRAN/METACRAN Total downloads Lifecycle: experimental Project Status: Active – The project has reached a stable, usable state and is being actively developed.

The goal of metaphonebr is to simplify brazilian names phonetically using a custom metaphoneBR algorithm that preserves ending vowels, created for aiding in dataset pairing in the absence of unambiguous keys.

Installation

The stable version of the package can be installed with:

install.packages("metaphonebr")

You can install the development version of metaphonebr from GitHub with :

# install.packages("remotes")
remotes::install_github("ipeadata-lab/metaphonebr")

Example

This is a basic example which shows how to use the main function:

example_names <- c("João da Silva", "Maria", "Marya",
                    "Helena", "Elena", "Philippe", "Filipe", "Xavier", "Chavier")
phonetic_codes <- metaphonebr::metaphonebr(example_names)
print(data.frame(original = example_names, metaphonebr = phonetic_codes))

The metaphoneBR phonetic encoding algorithm proceeds as follows:

  1. Initial Cleanup & Preparation:
  2. Silent Letter Removal:
  3. Digraph Simplification (Sound Grouping):
  4. Similar Consonant Simplification:
  5. Terminal Nasal Sound Simplification:
  6. Duplicate Vowel Removal:
  7. Final Cleanup (Duplicate Letters & Spaces):

The resulting code is an attempt to represent the phonetic signature of the name in a simplified, standardized way for a Brazilian Portuguese context. In particular, by construction it preserves ending vowels since they imply generally gender information in Brazilian Names (ex.: ADRIANO and ADRIANA).

Note Ipea

metaphonebr is developed by a team of researchers at Instituto de Pesquisa Econômica Aplicada (Ipea).