comperes offers a pipe (%>%) friendly
set of tools for storing and managing competition results. Understanding
of competition is quite general: it is a set of
games (abstract event) in which
players (abstract entity) gain some abstract
scores (typically numeric). The most natural example is
sport results, however not the only one. For example, product rating can
be considered as a competition between products as “players”. Here a
“game” is a customer that reviews a set of products by rating them with
numerical “score” (stars, points, etc.).
This package leverages dplyr’s grammar of data
manipulation. Only basic knowledge is enough to use
comperes.
comperes provides the following functionality:
as_longcr(),
is_longcr().tibble with one row per
game with fixed amount of players. Functions: as_widecr(),
is_widecr().dplyr’s grammar. Functions: summarise_item(),
summarise_game(), summarise_player().join_item_summary(),
join_game_summary(),
join_player_summary().dplyr’s grammar:
tibble with
one row per pair of players. Function: h2h_long().h2h_mat().. %>% h2h_mat(!!!h2h_funs["num_wins"]).You can install comperes from CRAN with:
install.packages("comperes")To install the most recent development version from GitHub use:
# install.packages("devtools")
devtools::install_github("echasnovski/comperes")We will be using ncaa2005, data from
comperes package. It is an example competition results
(hereafter - results) of an isolated group of Atlantic Coast Conference
teams provided in book “Who’s
#1” by Langville and Meyer. It looks like this:
library(comperes)
ncaa2005
#> # A longcr object:
#> # A tibble: 20 × 3
#>    game player score
#>   <int> <chr>  <int>
#> 1     1 Duke       7
#> 2     1 Miami     52
#> 3     2 Duke      21
#> 4     2 UNC       24
#> 5     3 Duke       7
#> 6     3 UVA       38
#> # … with 14 more rowsThis is an object of class longcr which describes
results in long form (each row represents the score of particular
player in particular game). Because in this competition a game
always consists from two players, more natural way to look at
ncaa2005 is in wide format:
as_widecr(ncaa2005)
#> # A widecr object:
#> # A tibble: 10 × 5
#>    game player1 score1 player2 score2
#>   <int> <chr>    <int> <chr>    <int>
#> 1     1 Duke         7 Miami       52
#> 2     2 Duke        21 UNC         24
#> 3     3 Duke         7 UVA         38
#> 4     4 Duke         0 VT          45
#> 5     5 Miami       34 UNC         16
#> 6     6 Miami       25 UVA         17
#> # … with 4 more rowsThis converted ncaa2005 into an object of
widecr class which describes results in wide format
(each row represents scores of all players in particular game).
All comperes functions expect either a data frame with
results structured in long format or one of supported classes:
longcr, widecr.
With compere the following summaries are possible:
ncaa2005 %>%
  summarise_player(min_score = min(score), mean_score = mean(score))
#> # A tibble: 5 × 3
#>   player min_score mean_score
#>   <chr>      <int>      <dbl>
#> 1 Duke           0       8.75
#> 2 Miami         25      34.5 
#> 3 UNC            3      12.5 
#> 4 UVA            5      18.5 
#> 5 VT             7      33.5
# Using list of common summary functions
library(rlang)
ncaa2005 %>%
  summarise_game(!!!summary_funs[c("sum_score", "num_players")])
#> # A tibble: 10 × 3
#>    game sum_score num_players
#>   <int>     <int>       <int>
#> 1     1        59           2
#> 2     2        45           2
#> 3     3        45           2
#> 4     4        45           2
#> 5     5        50           2
#> 6     6        42           2
#> # … with 4 more rowsSupplied list of common summary functions has 8 entries, which are
quoted expressions to be used in dplyr grammar:
summary_funs
#> $min_score
#> min(score)
#> 
#> $max_score
#> max(score)
#> 
#> $mean_score
#> mean(score)
#> 
#> $median_score
#> median(score)
#> 
#> $sd_score
#> sd(score)
#> 
#> $sum_score
#> sum(score)
#> 
#> $num_games
#> length(unique(game))
#> 
#> $num_players
#> length(unique(player))
ncaa2005 %>% summarise_player(!!!summary_funs)
#> # A tibble: 5 × 9
#>   player min_score max_score mean_score median_score sd_score sum_score num_games num_players
#>   <chr>      <int>     <int>      <dbl>        <dbl>    <dbl>     <int>     <int>       <int>
#> 1 Duke           0        21       8.75          7       8.81        35         4           1
#> 2 Miami         25        52      34.5          30.5    12.3        138         4           1
#> 3 UNC            3        24      12.5          11.5     9.40        50         4           1
#> 4 UVA            5        38      18.5          15.5    14.0         74         4           1
#> 5 VT             7        52      33.5          37.5    19.9        134         4           1To modify scores based on the rest of results one can use
join_*_summary() functions:
suppressPackageStartupMessages(library(dplyr))
ncaa2005_mod <- ncaa2005 %>%
  join_player_summary(player_mean_score = mean(score)) %>%
  join_game_summary(game_mean_score = mean(score)) %>%
  mutate(score = player_mean_score - game_mean_score)
ncaa2005_mod
#> # A longcr object:
#> # A tibble: 20 × 5
#>    game player score player_mean_score game_mean_score
#>   <int> <chr>  <dbl>             <dbl>           <dbl>
#> 1     1 Duke   -20.8              8.75            29.5
#> 2     1 Miami    5               34.5             29.5
#> 3     2 Duke   -13.8              8.75            22.5
#> 4     2 UNC    -10               12.5             22.5
#> 5     3 Duke   -13.8              8.75            22.5
#> 6     3 UVA     -4               18.5             22.5
#> # … with 14 more rows
ncaa2005_mod %>% summarise_player(mean_score = mean(score))
#> # A tibble: 5 × 2
#>   player mean_score
#>   <chr>       <dbl>
#> 1 Duke       -15.5 
#> 2 Miami       11.4 
#> 3 UNC         -5   
#> 4 UVA         -2.12
#> 5 VT          11.2This code modifies score to be average player score
minus average game score. Negative values indicate poor game
performance.
Computation of Head-to-Head performance is done with
h2h_long() (output is a tibble; allows multiple
Head-to-Head values per pair of players) or h2h_mat()
(output is a matrix; only one value per pair of players).
Head-to-Head functions should be supplied in dplyr
grammar but for players’ matchups: direct confrontation between
ordered pairs of players (including playing with
themselves) stored in wide format:
ncaa2005 %>% get_matchups()
#> # A widecr object:
#> # A tibble: 40 × 5
#>    game player1 score1 player2 score2
#>   <int> <chr>    <int> <chr>    <int>
#> 1     1 Duke         7 Duke         7
#> 2     1 Duke         7 Miami       52
#> 3     1 Miami       52 Duke         7
#> 4     1 Miami       52 Miami       52
#> 5     2 Duke        21 Duke        21
#> 6     2 Duke        21 UNC         24
#> # … with 34 more rowsTypical Head-to-Head computation is done like this:
ncaa2005 %>%
  h2h_long(
    mean_score_diff = mean(score1 - score2),
    num_wins = sum(score1 > score2)
  )
#> # A long format of Head-to-Head values:
#> # A tibble: 25 × 4
#>   player1 player2 mean_score_diff num_wins
#>   <chr>   <chr>             <dbl>    <int>
#> 1 Duke    Duke                  0        0
#> 2 Duke    Miami               -45        0
#> 3 Duke    UNC                  -3        0
#> 4 Duke    UVA                 -31        0
#> 5 Duke    VT                  -45        0
#> 6 Miami   Duke                 45        1
#> # … with 19 more rows
ncaa2005 %>% h2h_mat(mean(score1 - score2))
#> # A matrix format of Head-to-Head values:
#>       Duke Miami UNC UVA  VT
#> Duke     0   -45  -3 -31 -45
#> Miami   45     0  18   8  20
#> UNC      3   -18   0   2 -27
#> UVA     31    -8  -2   0 -38
#> VT      45   -20  27  38   0Supplied list of common Head-to-Head functions has 9 entries, which are also quoted expressions:
h2h_funs
#> $mean_score_diff
#> mean(score1 - score2)
#> 
#> $mean_score_diff_pos
#> max(mean(score1 - score2), 0)
#> 
#> $mean_score
#> mean(score1)
#> 
#> $sum_score_diff
#> sum(score1 - score2)
#> 
#> $sum_score_diff_pos
#> max(sum(score1 - score2), 0)
#> 
#> $sum_score
#> sum(score1)
#> 
#> $num_wins
#> num_wins(score1, score2, half_for_draw = FALSE)
#> 
#> $num_wins2
#> num_wins(score1, score2, half_for_draw = TRUE)
#> 
#> $num
#> dplyr::n()
ncaa2005 %>% h2h_long(!!!h2h_funs)
#> # A long format of Head-to-Head values:
#> # A tibble: 25 × 11
#>   player1 player2 mean_score_diff mean_…¹ mean_…² sum_s…³ sum_s…⁴ sum_s…⁵ num_w…⁶ num_w…⁷   num
#>   <chr>   <chr>             <dbl>   <dbl>   <dbl>   <int>   <dbl>   <int>   <dbl>   <dbl> <int>
#> 1 Duke    Duke                  0       0    8.75       0       0      35       0       2     4
#> 2 Duke    Miami               -45       0    7        -45       0       7       0       0     1
#> 3 Duke    UNC                  -3       0   21         -3       0      21       0       0     1
#> 4 Duke    UVA                 -31       0    7        -31       0       7       0       0     1
#> 5 Duke    VT                  -45       0    0        -45       0       0       0       0     1
#> 6 Miami   Duke                 45      45   52         45      45      52       1       1     1
#> # … with 19 more rows, and abbreviated variable names ¹mean_score_diff_pos, ²mean_score,
#> #   ³sum_score_diff, ⁴sum_score_diff_pos, ⁵sum_score, ⁶num_wins, ⁷num_wins2To compute Head-to-Head for only subset of players or include values
for players that are not in the results, use factor player
column:
ncaa2005 %>%
  mutate(player = factor(player, levels = c("Duke", "Miami", "Extra"))) %>%
  h2h_mat(!!!h2h_funs["num_wins"], fill = 0)
#> # A matrix format of Head-to-Head values:
#>       Duke Miami Extra
#> Duke     0     0     0
#> Miami    1     0     0
#> Extra    0     0     0