Skip to contents

Create Narrative for Time Series Forecast Data Frames

Usage

narrate_forecast(
  df,
  date = NULL,
  frequency = NULL,
  summarization = "sum",
  type = "yoy",
  coverage = 0.5,
  coverage_limit = 5,
  narration_depth = 2,
  use_chatgpt = FALSE,
  openai_api_key = Sys.getenv("OPENAI_API_KEY"),
  max_tokens = 1024,
  temperature = 0,
  top_p = 1,
  frequency_penalty = 0,
  presence_penalty = 0,
  forecast = "Forecast",
  actuals = "Actuals",
  template_cy =
    "Forecasted volumes for {current_year} are equal to {format_num(cy_forecast)}.",
  template_ftm = "Overall forecast for the next 12 months is {format_num(ftm_forecast)}.",
  template_ftm_change =
    "Projected {trend} in the next 12 months is equal to {format_num(ftm_change)} ({ftm_change_p}%).",
  use_renviron = FALSE,
  return_data = FALSE,
  simplify = FALSE,
  format_numbers = TRUE,
  collapse_sep = ", ",
  collapse_last = " and ",
  ...
)

Arguments

df

data.frame() or tibble() Data frame of tibble, can be aggregated or raw

date

Name of the date column to be used for time based analysis

frequency

Level of time based aggregation for comparing across years 'quarter', 'month', 'week'

summarization

Approach for data summarization/aggregation - 'sum', 'count' or 'average'

type

Type of trend analysis to create: 1 or 'yoy', 2 or 'previous period', 3 or 'same period last year'

coverage

Numeric portion of variability to be covered by narrative, 0 to 1

coverage_limit

Integer maximum number of elements to be narrated, overrides coverage to avoid extremely verbose narrative creation

narration_depth

Parameter to control the depth of the analysis 1 for summary and 2 for detailed

use_chatgpt

If TRUE - use ChatGPT to enhance the narrative

openai_api_key

Your OpenAI API key, you can set it up in .Renviron file as "OPENAI_API_KEY", function will look for it with Sys.getenv("OPENAI_API_KEY")

max_tokens

The maximum number of tokens to generate in the chat completion.

temperature

What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.

top_p

An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.

frequency_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.

presence_penalty

Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.

forecast

Name of the forecast column in the data frame

actuals

Name of the actuals column in the data frame

template_cy

glue template for current year volumes narrative

template_ftm

glue template for future 12 months projection

template_ftm_change

glue template for projected change in the next 12 months

use_renviron

If TRUE use .Renviron variables in the template. You can also set options(narrator.use_renviron = TRUE) to make it global for the session, or create an environment variable "use_renviron" by changing your .Renviron file usethis::edit_r_environ()

return_data

If TRUE - return a list of variables used in the function's templates

simplify

If TRUE - return a character vector, if FALSE - named list

format_numbers

If TRUE - format big numbers to K/M/B using format_num() function

collapse_sep

Separator for glue_collapse in cases with multiple values in single variable

collapse_last

Separator for glue_collapse for the last item, in cases with multiple values in single variable

...

other arguments passed to glue

Value

A list() of narratives by default and character() if simplify = TRUE

Examples

library(prophet)
#> Loading required package: Rcpp
#> Loading required package: rlang
library(dplyr)
#> 
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#> 
#>     filter, lag
#> The following objects are masked from ‘package:base’:
#> 
#>     intersect, setdiff, setequal, union
library(tidyr)

fit_prophet <- function(data) {
  model <- prophet(data)
  future <- make_future_dataframe(model, periods = 12, freq = "month")
  forecast <- predict(model, future)
  return(forecast)
}
grouped_data <- sales %>%
  dplyr::mutate(ds = lubridate::floor_date(Date, unit = "month")) %>%
  dplyr::group_by(Region, ds) %>%
  dplyr::summarise(y = sum(Sales, na.rm = TRUE)) %>%
  tidyr::nest()

grouped_data$forecast <- lapply(grouped_data$data, fit_prophet)
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.

actuals <- grouped_data %>%
  dplyr::select(-forecast) %>%
  unnest(data)

df <- grouped_data %>%
  dplyr::select(-data) %>%
  unnest(forecast) %>%
  dplyr::select(ds, yhat) %>%
  dplyr::left_join(actuals) %>%
  dplyr::rename(Actuals = y,
                Forecast = yhat)
#> Adding missing grouping variables: `Region`
#> Joining with `by = join_by(Region, ds)`

narrate_forecast(df)
#> $`Current Year Actuals`
#> Actuals for 2021 are equal to 13.5 M
#> 
#> $`12 Month Projection`
#> Overall forecast for the next 12 months is 13.8 M.
#> 
#> $`Overall increase the next 12 months`
#> Projected increase in the next 12 months is equal to 302.9 K (2.24%).
#> 
#> $`2022 YTD vs 2021 YTD`
#> From 2021 YTD to 2022 YTD, Forecast had an increase of 302.88 K (2.2 %, 13.55 M to 13.85 M).
#> 
#> $`Forecast change by Month`
#> Months with biggest changes of Forecast are May (141.3 K, 13.62 %, 1 M to 1.2 M), Mar (138 K, 13.94 %, 989.3 K to 1.1 M), Nov (-108.8 K, -6.36 %, 1.7 M to 1.6 M) and Sep (97.4 K, 8.68 %, 1.1 M to 1.2 M).
#>