Create Narrative for Metric Development in Time

Usage

narrate_trend(
  df,
  measure = NULL,
  dimensions = NULL,
  date = NULL,
  frequency = NULL,
  summarization = "sum",
  type = "yoy",
  coverage = 0.5,
  coverage_limit = 5,
  narration_depth = 2,
  use_chatgpt = FALSE,
  openai_api_key = Sys.getenv("OPENAI_API_KEY"),
  max_tokens = 1024,
  temperature = 0,
  top_p = 1,
  frequency_penalty = 0,
  presence_penalty = 0,
  template_total =
    "From {timeframe_prev} to {timeframe_curr}, {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).",
  template_average =
    "Average {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).",
  template_outlier =
    "{dimension} with biggest changes of {measure} is {outlier_insight}.",
  template_outlier_multiple =
    "{pluralize(dimension)} with biggest changes of {measure} are {outlier_insight}.",
  template_outlier_l2 =
    "In {level_l1}, significant {level_l2} by {measure} change is {outlier_insight}.",
  template_outlier_l2_multiple =
    "In {level_l1}, significant {pluralize(level_l2)} by {measure} change are {outlier_insight}.",
  use_renviron = FALSE,
  return_data = FALSE,
  simplify = FALSE,
  format_numbers = TRUE,
  collapse_sep = ", ",
  collapse_last = " and ",
  ...
)

Arguments

df: data.frame() or tibble() Data frame of tibble, can be aggregated or raw
measure: Numeric measure for function to create calculations with, if NULL then it will take the first numeric field available
dimensions: Vector of dimensions for analysis, by default all character or factor variable will be used
date: Name of the date column to be used for time based analysis
frequency: Level of time based aggregation for comparing across years 'quarter', 'month', 'week'
summarization: Approach for data summarization/aggregation - 'sum', 'count' or 'average'
type: Type of trend analysis to create: 1 or 'yoy', 2 or 'previous period', 3 or 'same period last year'
coverage: Numeric portion of variability to be covered by narrative, 0 to 1
coverage_limit: Integer maximum number of elements to be narrated, overrides coverage to avoid extremely verbose narrative creation
narration_depth: Parameter to control the depth of the analysis 1 for summary and 2 for detailed
use_chatgpt: If TRUE - use ChatGPT to enhance the narrative
openai_api_key: Your OpenAI API key, you can set it up in .Renviron file as "OPENAI_API_KEY", function will look for it with Sys.getenv("OPENAI_API_KEY")
max_tokens: The maximum number of tokens to generate in the chat completion.
temperature: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
template_total: glue template for total volumes narrative
template_average: glue template for average volumes narrative
template_outlier: glue template for single outlier narrative
template_outlier_multiple: glue template for multiple outliers narrative
template_outlier_l2: glue template for deeper hierarchical single outlier narrative
template_outlier_l2_multiple: glue template for deeper hierarchical multiple outliers narrative
use_renviron: If TRUE use .Renviron variables in the template. You can also set options(narrator.use_renviron = TRUE) to make it global for the session, or create an environment variable "use_renviron" by changing your .Renviron file usethis::edit_r_environ()
return_data: If TRUE - return a list of variables used in the function's templates
simplify: If TRUE - return a character vector, if FALSE - named list
format_numbers: If TRUE - format big numbers to K/M/B using format_num() function
collapse_sep: Separator for glue_collapse in cases with multiple values in single variable
collapse_last: Separator for glue_collapse for the last item, in cases with multiple values in single variable
...: other arguments passed to glue

Value

A list() of narratives by default and character() if simplify = TRUE

Examples

sales %>%
 dplyr::mutate(Date = lubridate::floor_date(Date, unit = "month")) %>%
 dplyr::group_by(Region, Product, Date) %>%
 dplyr::summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
 narrate_trend()
#> $`2021 YTD vs 2020 YTD`
#> From 2020 YTD to 2021 YTD, Sales had an increase of 1.13 M (9.1 %, 12.42 M to 13.55 M).
#> 
#> $`Sales change by Region`
#> Regions with biggest changes of Sales are NA (533.1 K, 9.1 %, 5.9 M to 6.4 M) and EMEA (416.9 K, 9.91 %, 4.2 M to 4.6 M).
#> 
#> $`NA by Product`
#> In NA, significant Products by Sales change are Food & Beverage (243.3 K, 9.92 %, 2.5 M to 2.7 M) and Tools (186.8 K, 31.87 %, 585.9 K to 772.7 K).
#> 
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales change are Electronics (312.1 K, 35.88 %, 869.7 K to 1.2 M) and Food & Beverage (238.2 K, 14.54 %, 1.6 M to 1.9 M).
#> 
#> $`Sales change by Product`
#> Products with biggest changes of Sales are Food & Beverage (535.4 K, 10.63 %, 5 M to 5.6 M) and Electronics (525.9 K, 19.79 %, 2.7 M to 3.2 M).
#> 
#> $`Food & Beverage by Month`
#> In Food & Beverage, significant Months by Sales change are Oct (-141.6 K, -23.39 %, 605.4 K to 463.8 K), Sep (132.7 K, 37.27 %, 356.2 K to 489 K), Dec (118.3 K, 16.67 %, 709.5 K to 827.8 K) and May (99 K, 28.12 %, 352 K to 451 K).
#> 
#> $`Electronics by Month`
#> In Electronics, significant Months by Sales change are Nov (170.7 K, 70.62 %, 241.7 K to 412.4 K), Dec (108.3 K, 36.23 %, 298.8 K to 407.1 K), May (-74.1 K, -26.73 %, 277.3 K to 203.2 K) and Feb (70.6 K, 38.24 %, 184.6 K to 255.3 K).
#> 
#> $`Sales change by Month`
#> Months with biggest changes of Sales are Nov (386.5 K, 29.17 %, 1.3 M to 1.7 M), Apr (226.6 K, 24.4 %, 928.6 K to 1.2 M) and Jan (162.2 K, 23.06 %, 703.4 K to 865.6 K).
#> 

sales %>%
 dplyr::mutate(Date = lubridate::floor_date(Date, unit = "quarter")) %>%
 dplyr::group_by(Region, Product, Date) %>%
 dplyr::summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
 narrate_trend()
#> $`2021 YTD vs 2020 YTD`
#> From 2020 YTD to 2021 YTD, Sales had an increase of 1.13 M (9.1 %, 12.42 M to 13.55 M).
#> 
#> $`Sales change by Region`
#> Regions with biggest changes of Sales are NA (533.1 K, 9.1 %, 5.9 M to 6.4 M) and EMEA (416.9 K, 9.91 %, 4.2 M to 4.6 M).
#> 
#> $`NA by Product`
#> In NA, significant Products by Sales change are Food & Beverage (243.3 K, 9.92 %, 2.5 M to 2.7 M) and Tools (186.8 K, 31.87 %, 585.9 K to 772.7 K).
#> 
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales change are Electronics (312.1 K, 35.88 %, 869.7 K to 1.2 M) and Food & Beverage (238.2 K, 14.54 %, 1.6 M to 1.9 M).
#> 
#> $`Sales change by Product`
#> Products with biggest changes of Sales are Food & Beverage (535.4 K, 10.63 %, 5 M to 5.6 M) and Electronics (525.9 K, 19.79 %, 2.7 M to 3.2 M).
#> 
#> $`Food & Beverage by Quarter`
#> In Food & Beverage, significant Quarters by Sales change are Q2 (261.4 K, 24.46 %, 1.1 M to 1.3 M) and Q3 (211.7 K, 21.84 %, 969.1 K to 1.2 M).
#> 
#> $`Electronics by Quarter`
#> In Electronics, significant Quarter by Sales change is Q4 (340.8 K, 40.13 %, 849.1 K to 1.2 M).
#> 
#> $`Sales change by Quarter`
#> Quarters with biggest changes of Sales are Q4 (508.5 K, 11.41 %, 4.5 M to 5 M) and Q2 (354.3 K, 13.35 %, 2.7 M to 3 M).
#>