Create Narrative for Metric Development in Time
Usage
narrate_trend(
df,
measure = NULL,
dimensions = NULL,
date = NULL,
frequency = NULL,
summarization = "sum",
type = "yoy",
coverage = 0.5,
coverage_limit = 5,
narration_depth = 2,
use_chatgpt = FALSE,
openai_api_key = Sys.getenv("OPENAI_API_KEY"),
max_tokens = 1024,
temperature = 0,
top_p = 1,
frequency_penalty = 0,
presence_penalty = 0,
template_total =
"From {timeframe_prev} to {timeframe_curr}, {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).",
template_average =
"Average {measure} had an {trend} of {change} ({change_p}, {total_prev} to {total_curr}).",
template_outlier =
"{dimension} with biggest changes of {measure} is {outlier_insight}.",
template_outlier_multiple =
"{pluralize(dimension)} with biggest changes of {measure} are {outlier_insight}.",
template_outlier_l2 =
"In {level_l1}, significant {level_l2} by {measure} change is {outlier_insight}.",
template_outlier_l2_multiple =
"In {level_l1}, significant {pluralize(level_l2)} by {measure} change are {outlier_insight}.",
use_renviron = FALSE,
return_data = FALSE,
simplify = FALSE,
format_numbers = TRUE,
collapse_sep = ", ",
collapse_last = " and ",
...
)
Arguments
- df
data.frame()
ortibble()
Data frame of tibble, can be aggregated or raw- measure
Numeric measure for function to create calculations with, if NULL then it will take the first numeric field available
- dimensions
Vector of dimensions for analysis, by default all character or factor variable will be used
- date
Name of the date column to be used for time based analysis
- frequency
Level of time based aggregation for comparing across years 'quarter', 'month', 'week'
- summarization
Approach for data summarization/aggregation - 'sum', 'count' or 'average'
- type
Type of trend analysis to create: 1 or 'yoy', 2 or 'previous period', 3 or 'same period last year'
- coverage
Numeric portion of variability to be covered by narrative, 0 to 1
- coverage_limit
Integer maximum number of elements to be narrated, overrides coverage to avoid extremely verbose narrative creation
- narration_depth
Parameter to control the depth of the analysis 1 for summary and 2 for detailed
- use_chatgpt
If TRUE - use ChatGPT to enhance the narrative
- openai_api_key
Your OpenAI API key, you can set it up in .Renviron file as "OPENAI_API_KEY", function will look for it with
Sys.getenv("OPENAI_API_KEY")
- max_tokens
The maximum number of tokens to generate in the chat completion.
- temperature
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- top_p
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
- frequency_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
- presence_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
- template_total
glue
template for total volumes narrative- template_average
glue
template for average volumes narrative- template_outlier
glue
template for single outlier narrative- template_outlier_multiple
glue
template for multiple outliers narrative- template_outlier_l2
glue
template for deeper hierarchical single outlier narrative- template_outlier_l2_multiple
glue
template for deeper hierarchical multiple outliers narrative- use_renviron
If TRUE use .Renviron variables in the template. You can also set
options(narrator.use_renviron = TRUE)
to make it global for the session, or create an environment variable "use_renviron" by changing your .Renviron fileusethis::edit_r_environ()
- return_data
If TRUE - return a list of variables used in the function's templates
- simplify
- format_numbers
If TRUE - format big numbers to K/M/B using
format_num()
function- collapse_sep
Separator for
glue_collapse
in cases with multiple values in single variable- collapse_last
Separator for
glue_collapse
for the last item, in cases with multiple values in single variable- ...
other arguments passed to
glue
Value
A list()
of narratives by default and character()
if simplify = TRUE
Examples
sales %>%
dplyr::mutate(Date = lubridate::floor_date(Date, unit = "month")) %>%
dplyr::group_by(Region, Product, Date) %>%
dplyr::summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
narrate_trend()
#> $`2021 YTD vs 2020 YTD`
#> From 2020 YTD to 2021 YTD, Sales had an increase of 1.13 M (9.1 %, 12.42 M to 13.55 M).
#>
#> $`Sales change by Region`
#> Regions with biggest changes of Sales are NA (533.1 K, 9.1 %, 5.9 M to 6.4 M) and EMEA (416.9 K, 9.91 %, 4.2 M to 4.6 M).
#>
#> $`NA by Product`
#> In NA, significant Products by Sales change are Food & Beverage (243.3 K, 9.92 %, 2.5 M to 2.7 M) and Tools (186.8 K, 31.87 %, 585.9 K to 772.7 K).
#>
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales change are Electronics (312.1 K, 35.88 %, 869.7 K to 1.2 M) and Food & Beverage (238.2 K, 14.54 %, 1.6 M to 1.9 M).
#>
#> $`Sales change by Product`
#> Products with biggest changes of Sales are Food & Beverage (535.4 K, 10.63 %, 5 M to 5.6 M) and Electronics (525.9 K, 19.79 %, 2.7 M to 3.2 M).
#>
#> $`Food & Beverage by Month`
#> In Food & Beverage, significant Months by Sales change are Oct (-141.6 K, -23.39 %, 605.4 K to 463.8 K), Sep (132.7 K, 37.27 %, 356.2 K to 489 K), Dec (118.3 K, 16.67 %, 709.5 K to 827.8 K) and May (99 K, 28.12 %, 352 K to 451 K).
#>
#> $`Electronics by Month`
#> In Electronics, significant Months by Sales change are Nov (170.7 K, 70.62 %, 241.7 K to 412.4 K), Dec (108.3 K, 36.23 %, 298.8 K to 407.1 K), May (-74.1 K, -26.73 %, 277.3 K to 203.2 K) and Feb (70.6 K, 38.24 %, 184.6 K to 255.3 K).
#>
#> $`Sales change by Month`
#> Months with biggest changes of Sales are Nov (386.5 K, 29.17 %, 1.3 M to 1.7 M), Apr (226.6 K, 24.4 %, 928.6 K to 1.2 M) and Jan (162.2 K, 23.06 %, 703.4 K to 865.6 K).
#>
sales %>%
dplyr::mutate(Date = lubridate::floor_date(Date, unit = "quarter")) %>%
dplyr::group_by(Region, Product, Date) %>%
dplyr::summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
narrate_trend()
#> $`2021 YTD vs 2020 YTD`
#> From 2020 YTD to 2021 YTD, Sales had an increase of 1.13 M (9.1 %, 12.42 M to 13.55 M).
#>
#> $`Sales change by Region`
#> Regions with biggest changes of Sales are NA (533.1 K, 9.1 %, 5.9 M to 6.4 M) and EMEA (416.9 K, 9.91 %, 4.2 M to 4.6 M).
#>
#> $`NA by Product`
#> In NA, significant Products by Sales change are Food & Beverage (243.3 K, 9.92 %, 2.5 M to 2.7 M) and Tools (186.8 K, 31.87 %, 585.9 K to 772.7 K).
#>
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales change are Electronics (312.1 K, 35.88 %, 869.7 K to 1.2 M) and Food & Beverage (238.2 K, 14.54 %, 1.6 M to 1.9 M).
#>
#> $`Sales change by Product`
#> Products with biggest changes of Sales are Food & Beverage (535.4 K, 10.63 %, 5 M to 5.6 M) and Electronics (525.9 K, 19.79 %, 2.7 M to 3.2 M).
#>
#> $`Food & Beverage by Quarter`
#> In Food & Beverage, significant Quarters by Sales change are Q2 (261.4 K, 24.46 %, 1.1 M to 1.3 M) and Q3 (211.7 K, 21.84 %, 969.1 K to 1.2 M).
#>
#> $`Electronics by Quarter`
#> In Electronics, significant Quarter by Sales change is Q4 (340.8 K, 40.13 %, 849.1 K to 1.2 M).
#>
#> $`Sales change by Quarter`
#> Quarters with biggest changes of Sales are Q4 (508.5 K, 11.41 %, 4.5 M to 5 M) and Q2 (354.3 K, 13.35 %, 2.7 M to 3 M).
#>