Create Narrative for Time Series Forecast Data Frames

Usage

narrate_forecast(
  df,
  date = NULL,
  frequency = NULL,
  summarization = "sum",
  type = "yoy",
  coverage = 0.5,
  coverage_limit = 5,
  narration_depth = 2,
  use_chatgpt = FALSE,
  openai_api_key = Sys.getenv("OPENAI_API_KEY"),
  max_tokens = 1024,
  temperature = 0,
  top_p = 1,
  frequency_penalty = 0,
  presence_penalty = 0,
  forecast = "Forecast",
  actuals = "Actuals",
  template_cy =
    "Forecasted volumes for {current_year} are equal to {format_num(cy_forecast)}.",
  template_ftm = "Overall forecast for the next 12 months is {format_num(ftm_forecast)}.",
  template_ftm_change =
    "Projected {trend} in the next 12 months is equal to {format_num(ftm_change)} ({ftm_change_p}%).",
  use_renviron = FALSE,
  return_data = FALSE,
  simplify = FALSE,
  format_numbers = TRUE,
  collapse_sep = ", ",
  collapse_last = " and ",
  ...
)

Arguments

df: data.frame() or tibble() Data frame of tibble, can be aggregated or raw
date: Name of the date column to be used for time based analysis
frequency: Level of time based aggregation for comparing across years 'quarter', 'month', 'week'
summarization: Approach for data summarization/aggregation - 'sum', 'count' or 'average'
type: Type of trend analysis to create: 1 or 'yoy', 2 or 'previous period', 3 or 'same period last year'
coverage: Numeric portion of variability to be covered by narrative, 0 to 1
coverage_limit: Integer maximum number of elements to be narrated, overrides coverage to avoid extremely verbose narrative creation
narration_depth: Parameter to control the depth of the analysis 1 for summary and 2 for detailed
use_chatgpt: If TRUE - use ChatGPT to enhance the narrative
openai_api_key: Your OpenAI API key, you can set it up in .Renviron file as "OPENAI_API_KEY", function will look for it with Sys.getenv("OPENAI_API_KEY")
max_tokens: The maximum number of tokens to generate in the chat completion.
temperature: What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
top_p: An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
frequency_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
presence_penalty: Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
forecast: Name of the forecast column in the data frame
actuals: Name of the actuals column in the data frame
template_cy: glue template for current year volumes narrative
template_ftm: glue template for future 12 months projection
template_ftm_change: glue template for projected change in the next 12 months
use_renviron: If TRUE use .Renviron variables in the template. You can also set options(narrator.use_renviron = TRUE) to make it global for the session, or create an environment variable "use_renviron" by changing your .Renviron file usethis::edit_r_environ()
return_data: If TRUE - return a list of variables used in the function's templates
simplify: If TRUE - return a character vector, if FALSE - named list
format_numbers: If TRUE - format big numbers to K/M/B using format_num() function
collapse_sep: Separator for glue_collapse in cases with multiple values in single variable
collapse_last: Separator for glue_collapse for the last item, in cases with multiple values in single variable
...: other arguments passed to glue

Value

A list() of narratives by default and character() if simplify = TRUE

Examples

library(prophet)
#> Loading required package: Rcpp
#> Loading required package: rlang
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

fit_prophet <- function(data) {
  model <- prophet(data)
  future <- make_future_dataframe(model, periods = 12, freq = "month")
  forecast <- predict(model, future)
  return(forecast)
}
grouped_data <- sales %>%
  dplyr::mutate(ds = lubridate::floor_date(Date, unit = "month")) %>%
  dplyr::group_by(Region, ds) %>%
  dplyr::summarise(y = sum(Sales, na.rm = TRUE)) %>%
  tidyr::nest()

grouped_data$forecast <- lapply(grouped_data$data, fit_prophet)
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

actuals <- grouped_data %>%
  dplyr::select(-forecast) %>%
  unnest(data)

df <- grouped_data %>%
  dplyr::select(-data) %>%
  unnest(forecast) %>%
  dplyr::select(ds, yhat) %>%
  dplyr::left_join(actuals) %>%
  dplyr::rename(Actuals = y,
                Forecast = yhat)
#> Adding missing grouping variables: `Region`
#> Joining with `by = join_by(Region, ds)`

narrate_forecast(df)
#> $`Current Year Actuals`
#> Actuals for 2021 are equal to 13.5 M
#> 
#> $`12 Month Projection`
#> Overall forecast for the next 12 months is 13.8 M.
#> 
#> $`Overall increase the next 12 months`
#> Projected increase in the next 12 months is equal to 302.9 K (2.24%).
#> 
#> $`2022 YTD vs 2021 YTD`
#> From 2021 YTD to 2022 YTD, Forecast had an increase of 302.88 K (2.2 %, 13.55 M to 13.85 M).
#> 
#> $`Forecast change by Month`
#> Months with biggest changes of Forecast are May (141.3 K, 13.62 %, 1 M to 1.2 M), Mar (138 K, 13.94 %, 989.3 K to 1.1 M), Nov (-108.8 K, -6.36 %, 1.7 M to 1.6 M) and Sep (97.4 K, 8.68 %, 1.1 M to 1.2 M).
#>