Editing Templates • narrator

library(narrator)
library(dplyr)
library(knitr)

Changing Templates

Template-based systems are based on calculations of individual variables, and combining them together inside of the template. You can return a list of calculated variables and full narrative realization:

df <- sales %>%
  group_by(Product) %>%
  summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
  arrange(desc(Sales))

narrate_descriptive(df, coverage = 0.7, return_data = TRUE)
#> $narrative
#> $narrative$`Total Sales`
#> Total Sales across all Products is 38790478.4.
#> 
#> $narrative$`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %) and Home (4599370.9, 11.9 %).
#> 
#> 
#> $`Total Sales`
#> $`Total Sales`$narrative_total
#> Total Sales across all Products is 38790478.4.
#> 
#> $`Total Sales`$template_total
#> [1] "Total {measure} across all {pluralize(dimension_one)} is {total}."
#> 
#> $`Total Sales`$measure
#> [1] "Sales"
#> 
#> $`Total Sales`$dimension_one
#> [1] "Product"
#> 
#> $`Total Sales`$total
#> [1] 38790478
#> 
#> $`Total Sales`$total_raw
#> [1] 38790478
#> 
#> 
#> $`Product by Sales`
#> $`Product by Sales`$narrative_outlier_final
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %) and Home (4599370.9, 11.9 %).
#> 
#> $`Product by Sales`$template_outlier_multiple
#> [1] "Outlying {pluralize(dimension)} by {measure} are {outlier_insight}."
#> 
#> $`Product by Sales`$dimension
#> [1] "Product"
#> 
#> $`Product by Sales`$measure
#> [1] "Sales"
#> 
#> $`Product by Sales`$outlier_insight
#> [1] "Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %), Home (4599370.9, 11.9 %)"
#> 
#> $`Product by Sales`$n_outliers
#> [1] 3
#> 
#> $`Product by Sales`$outlier_levels
#> [1] "Food & Beverage" "Electronics"     "Home"           
#> 
#> $`Product by Sales`$outlier_values
#> [1] 15543470  8608963  4599371
#> 
#> $`Product by Sales`$outlier_values_p
#> [1] "40.1 %" "22.2 %" "11.9 %"

Let’s change a template for our narrative function and rework the wording as well as changing ‘is’ to ‘are’ since Sales is plural:

narrate_descriptive(
  df,
  template_total = "Overall {measure} for all {pluralize(dimension_one)} are equal to {total}. ")
#> $`Total Sales`
#> Overall Sales for all Products are equal to 38790478.4. 
#> 
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).

sales %>%
  narrate_descriptive(
    template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
    template_outlier_multiple = "{outlier_levels} are the biggest {dimension} by {measure} with {outlier_values} or {outlier_values_p} share of total {measure} respectively.",
    measure = "Sales",
    dimensions = c("Region", "Product"),
    coverage = 0.4)
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#> 
#> $`Region by Sales`
#> Among Regions NA is an outlier, with Sales of 18079736.4, making it 46.6 % of total.
#> 
#> $`NA by Product`
#> In NA, significant Product by Sales is Food & Beverage (7392821, 40.9 %).
#> 
#> $`Product by Sales`
#> Among Products Food & Beverage is an outlier, with Sales of 15543469.7, making it 40.1 % of total.

R Environment

We can set the templates using usethis::edit_r_environ() or setting up an environment variable using your data science platform capabilities.

Sys.getenv("descriptive_template_total")

narrate_descriptive(
  df,
  template_total = Sys.getenv("descriptive_template_total")
)

A method that is a little more handy is setting .Renviron variable names using a certain convention and adding use_renviron = TRUE to the narrate function. For example here we created a variable that uses the unique part of the function name, underscore, and template name - descriptive + - + template_total

narrate_descriptive(
  df, 
  use_renviron = TRUE
)
#> $`Total Sales`
#> Total Sales across all Products is 38790478.4.
#> 
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).

List Templates

You can get all templates currently available in the package with list_templates() function:

list_templates() %>%
  kable()

fun	type	name	template
narrate_descriptive	descriptive	template_total	Total {measure} across all {pluralize(dimension_one)} is {total}.
narrate_descriptive	descriptive	template_average	Average {measure} across all {pluralize(dimension_one)} is {total}.
narrate_descriptive	descriptive	template_outlier	Outlying {dimension} by {measure} is {outlier_insight}.
narrate_descriptive	descriptive	template_outlier_multiple	Outlying {pluralize(dimension)} by {measure} are {outlier_insight}.
narrate_descriptive	descriptive	template_outlier_l2	In {level_l1}, significant {level_l2} by {measure} is {outlier_insight}.
narrate_descriptive	descriptive	template_outlier_l2_multiple	In {level_l1}, significant {pluralize(level_l2)} by {measure} are {outlier_insight}.
narrate_forecast	forecast	template_cy	Forecasted volumes for {current_year} are equal to {format_num(cy_forecast)}.
narrate_forecast	forecast	template_ftm	Overall forecast for the next 12 months is {format_num(ftm_forecast)}.
narrate_forecast	forecast	template_ftm_change	Projected {trend} in the next 12 months is equal to {format_num(ftm_change)} ({ftm_change_p}%).
narrate_trend	trend	template_total	From {timeframe_prev} to {timeframe_curr}, {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).
narrate_trend	trend	template_average	Average {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).
narrate_trend	trend	template_outlier	{dimension} with biggest changes of {measure} is {outlier_insight}.
narrate_trend	trend	template_outlier_multiple	{pluralize(dimension)} with biggest changes of {measure} are {outlier_insight}.
narrate_trend	trend	template_outlier_l2	In {level_l1}, significant {level_l2} by {measure} change is {outlier_insight}.
narrate_trend	trend	template_outlier_l2_multiple	In {level_l1}, significant {pluralize(level_l2)} by {measure} change are {outlier_insight}.

Editing Gadget

Exploring templates might not be an easy task, narrator allows to prepare templates interactively using edit_templates() function.

edit_templates()

Average vs Others

When summarization is set to average the way of output calculation is significantly different. Generally better way of using it in your outputs is using different templates for different summarization types.

Giving users control over template creation and avoiding strict ruling for how they should look does introduce some additional complexity.

For example, if we use a template_outlier for default summarization = "sum" we get a great descriptive narrative:

sales %>%
  narrate_descriptive(
    template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
    measure = "Sales",
    dimensions = "Region",
    coverage = 0.3)
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#> 
#> $`Region by Sales`
#> Among Regions NA is an outlier, with Sales of 18079736.4, making it 46.6 % of total.

But when trying the same for summarization = "average" we get into trouble, because for this case we compare the values for every level to the overall average Sales. Default templates handle that using logic in the {outlier_insight} calculation.

sales %>%
  narrate_descriptive(
    template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
    measure = "Sales",
    dimensions = "Region",
    summarization = "average",
    coverage = 0.3)
#> $`Average Sales`
#> Average Sales across all Regions is 3879.
#> 
#> $`Region by Sales`
#> Among Regions LATAM is an outlier, with Sales of 2303.3, making it -40.6 % of total.

So here we can using something like this:

sales %>%
  narrate_descriptive(
    template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with average {measure} of {outlier_values}, {outlier_values_p} lower than average {measure} across all records.",
    measure = "Sales",
    dimensions = "Region",
    summarization = "average",
    coverage = 0.3)
#> $`Average Sales`
#> Average Sales across all Regions is 3879.
#> 
#> $`Region by Sales`
#> Among Regions LATAM is an outlier, with average Sales of 2303.3, -40.6 % lower than average Sales across all records.