Changing Templates
Template-based systems are based on calculations of individual variables, and combining them together inside of the template. You can return a list of calculated variables and full narrative realization:
df <- sales %>%
group_by(Product) %>%
summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
arrange(desc(Sales))
narrate_descriptive(df, coverage = 0.7, return_data = TRUE)
#> $narrative
#> $narrative$`Total Sales`
#> Total Sales across all Products is 38790478.4.
#>
#> $narrative$`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %) and Home (4599370.9, 11.9 %).
#>
#>
#> $`Total Sales`
#> $`Total Sales`$narrative_total
#> Total Sales across all Products is 38790478.4.
#>
#> $`Total Sales`$template_total
#> [1] "Total {measure} across all {pluralize(dimension_one)} is {total}."
#>
#> $`Total Sales`$measure
#> [1] "Sales"
#>
#> $`Total Sales`$dimension_one
#> [1] "Product"
#>
#> $`Total Sales`$total
#> [1] 38790478
#>
#> $`Total Sales`$total_raw
#> [1] 38790478
#>
#>
#> $`Product by Sales`
#> $`Product by Sales`$narrative_outlier_final
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %) and Home (4599370.9, 11.9 %).
#>
#> $`Product by Sales`$template_outlier_multiple
#> [1] "Outlying {pluralize(dimension)} by {measure} are {outlier_insight}."
#>
#> $`Product by Sales`$dimension
#> [1] "Product"
#>
#> $`Product by Sales`$measure
#> [1] "Sales"
#>
#> $`Product by Sales`$outlier_insight
#> [1] "Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %), Home (4599370.9, 11.9 %)"
#>
#> $`Product by Sales`$n_outliers
#> [1] 3
#>
#> $`Product by Sales`$outlier_levels
#> [1] "Food & Beverage" "Electronics" "Home"
#>
#> $`Product by Sales`$outlier_values
#> [1] 15543470 8608963 4599371
#>
#> $`Product by Sales`$outlier_values_p
#> [1] "40.1 %" "22.2 %" "11.9 %"
Let’s change a template for our narrative function and rework the
wording as well as changing ‘is’ to ‘are’ since Sales
is
plural:
narrate_descriptive(
df,
template_total = "Overall {measure} for all {pluralize(dimension_one)} are equal to {total}. ")
#> $`Total Sales`
#> Overall Sales for all Products are equal to 38790478.4.
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).
sales %>%
narrate_descriptive(
template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
template_outlier_multiple = "{outlier_levels} are the biggest {dimension} by {measure} with {outlier_values} or {outlier_values_p} share of total {measure} respectively.",
measure = "Sales",
dimensions = c("Region", "Product"),
coverage = 0.4)
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#>
#> $`Region by Sales`
#> Among Regions NA is an outlier, with Sales of 18079736.4, making it 46.6 % of total.
#>
#> $`NA by Product`
#> In NA, significant Product by Sales is Food & Beverage (7392821, 40.9 %).
#>
#> $`Product by Sales`
#> Among Products Food & Beverage is an outlier, with Sales of 15543469.7, making it 40.1 % of total.
R Environment
We can set the templates using usethis::edit_r_environ()
or setting up an environment variable using your data science platform
capabilities.
Sys.getenv("descriptive_template_total")
narrate_descriptive(
df,
template_total = Sys.getenv("descriptive_template_total")
)
A method that is a little more handy is setting
.Renviron
variable names using a certain convention and
adding use_renviron = TRUE
to the narrate function. For
example here we created a variable that uses the unique part of the
function name, underscore, and template name - descriptive
+ -
+ template_total
narrate_descriptive(
df,
use_renviron = TRUE
)
#> $`Total Sales`
#> Total Sales across all Products is 38790478.4.
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).
List Templates
You can get all templates currently available in the package with
list_templates()
function:
list_templates() %>%
kable()
fun | type | name | template |
---|---|---|---|
narrate_descriptive | descriptive | template_total | Total {measure} across all {pluralize(dimension_one)} is {total}. |
narrate_descriptive | descriptive | template_average | Average {measure} across all {pluralize(dimension_one)} is {total}. |
narrate_descriptive | descriptive | template_outlier | Outlying {dimension} by {measure} is {outlier_insight}. |
narrate_descriptive | descriptive | template_outlier_multiple | Outlying {pluralize(dimension)} by {measure} are {outlier_insight}. |
narrate_descriptive | descriptive | template_outlier_l2 | In {level_l1}, significant {level_l2} by {measure} is {outlier_insight}. |
narrate_descriptive | descriptive | template_outlier_l2_multiple | In {level_l1}, significant {pluralize(level_l2)} by {measure} are {outlier_insight}. |
narrate_forecast | forecast | template_cy | Forecasted volumes for {current_year} are equal to {format_num(cy_forecast)}. |
narrate_forecast | forecast | template_ftm | Overall forecast for the next 12 months is {format_num(ftm_forecast)}. |
narrate_forecast | forecast | template_ftm_change | Projected {trend} in the next 12 months is equal to {format_num(ftm_change)} ({ftm_change_p}%). |
narrate_trend | trend | template_total | From {timeframe_prev} to {timeframe_curr}, {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}). |
narrate_trend | trend | template_average | Average {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}). |
narrate_trend | trend | template_outlier | {dimension} with biggest changes of {measure} is {outlier_insight}. |
narrate_trend | trend | template_outlier_multiple | {pluralize(dimension)} with biggest changes of {measure} are {outlier_insight}. |
narrate_trend | trend | template_outlier_l2 | In {level_l1}, significant {level_l2} by {measure} change is {outlier_insight}. |
narrate_trend | trend | template_outlier_l2_multiple | In {level_l1}, significant {pluralize(level_l2)} by {measure} change are {outlier_insight}. |
Editing Gadget
Exploring templates might not be an easy task, narrator
allows to prepare templates interactively using
edit_templates()
function.
Average vs Others
When summarization
is set to average
the
way of output calculation is significantly different. Generally better
way of using it in your outputs is using different templates for
different summarization types.
Giving users control over template creation and avoiding strict ruling for how they should look does introduce some additional complexity.
For example, if we use a template_outlier
for default
summarization = "sum"
we get a great descriptive
narrative:
sales %>%
narrate_descriptive(
template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
measure = "Sales",
dimensions = "Region",
coverage = 0.3)
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#>
#> $`Region by Sales`
#> Among Regions NA is an outlier, with Sales of 18079736.4, making it 46.6 % of total.
But when trying the same for summarization = "average"
we get into trouble, because for this case we compare the values for
every level to the overall average Sales. Default templates handle that
using logic in the {outlier_insight}
calculation.
sales %>%
narrate_descriptive(
template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with {measure} of {outlier_values}, making it {outlier_values_p} of total.",
measure = "Sales",
dimensions = "Region",
summarization = "average",
coverage = 0.3)
#> $`Average Sales`
#> Average Sales across all Regions is 3879.
#>
#> $`Region by Sales`
#> Among Regions LATAM is an outlier, with Sales of 2303.3, making it -40.6 % of total.
So here we can using something like this:
sales %>%
narrate_descriptive(
template_outlier = "Among {pluralize(dimension)} {outlier_levels} is an outlier, with average {measure} of {outlier_values}, {outlier_values_p} lower than average {measure} across all records.",
measure = "Sales",
dimensions = "Region",
summarization = "average",
coverage = 0.3)
#> $`Average Sales`
#> Average Sales across all Regions is 3879.
#>
#> $`Region by Sales`
#> Among Regions LATAM is an outlier, with average Sales of 2303.3, -40.6 % lower than average Sales across all records.