Create Narrative for Time Series Forecast Data Frames
Source:R/narrate_forecast.R
narrate_forecast.Rd
Create Narrative for Time Series Forecast Data Frames
Usage
narrate_forecast(
df,
date = NULL,
frequency = NULL,
summarization = "sum",
type = "yoy",
coverage = 0.5,
coverage_limit = 5,
narration_depth = 2,
use_chatgpt = FALSE,
openai_api_key = Sys.getenv("OPENAI_API_KEY"),
max_tokens = 1024,
temperature = 0,
top_p = 1,
frequency_penalty = 0,
presence_penalty = 0,
forecast = "Forecast",
actuals = "Actuals",
template_cy =
"Forecasted volumes for {current_year} are equal to {format_num(cy_forecast)}.",
template_ftm = "Overall forecast for the next 12 months is {format_num(ftm_forecast)}.",
template_ftm_change =
"Projected {trend} in the next 12 months is equal to {format_num(ftm_change)} ({ftm_change_p}%).",
use_renviron = FALSE,
return_data = FALSE,
simplify = FALSE,
format_numbers = TRUE,
collapse_sep = ", ",
collapse_last = " and ",
...
)
Arguments
- df
data.frame()
ortibble()
Data frame of tibble, can be aggregated or raw- date
Name of the date column to be used for time based analysis
- frequency
Level of time based aggregation for comparing across years 'quarter', 'month', 'week'
- summarization
Approach for data summarization/aggregation - 'sum', 'count' or 'average'
- type
Type of trend analysis to create: 1 or 'yoy', 2 or 'previous period', 3 or 'same period last year'
- coverage
Numeric portion of variability to be covered by narrative, 0 to 1
- coverage_limit
Integer maximum number of elements to be narrated, overrides coverage to avoid extremely verbose narrative creation
- narration_depth
Parameter to control the depth of the analysis 1 for summary and 2 for detailed
- use_chatgpt
If TRUE - use ChatGPT to enhance the narrative
- openai_api_key
Your OpenAI API key, you can set it up in .Renviron file as "OPENAI_API_KEY", function will look for it with
Sys.getenv("OPENAI_API_KEY")
- max_tokens
The maximum number of tokens to generate in the chat completion.
- temperature
What sampling temperature to use, between 0 and 2. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more focused and deterministic.
- top_p
An alternative to sampling with temperature, called nucleus sampling, where the model considers the results of the tokens with top_p probability mass. So 0.1 means only the tokens comprising the top 10% probability mass are considered.
- frequency_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on their existing frequency in the text so far, decreasing the model's likelihood to repeat the same line verbatim.
- presence_penalty
Number between -2.0 and 2.0. Positive values penalize new tokens based on whether they appear in the text so far, increasing the model's likelihood to talk about new topics.
- forecast
Name of the forecast column in the data frame
- actuals
Name of the actuals column in the data frame
- template_cy
glue
template for current year volumes narrative- template_ftm
glue
template for future 12 months projection- template_ftm_change
glue
template for projected change in the next 12 months- use_renviron
If TRUE use .Renviron variables in the template. You can also set
options(narrator.use_renviron = TRUE)
to make it global for the session, or create an environment variable "use_renviron" by changing your .Renviron fileusethis::edit_r_environ()
- return_data
If TRUE - return a list of variables used in the function's templates
- simplify
- format_numbers
If TRUE - format big numbers to K/M/B using
format_num()
function- collapse_sep
Separator for
glue_collapse
in cases with multiple values in single variable- collapse_last
Separator for
glue_collapse
for the last item, in cases with multiple values in single variable- ...
other arguments passed to
glue
Value
A list()
of narratives by default and character()
if simplify = TRUE
Examples
library(prophet)
#> Loading required package: Rcpp
#> Loading required package: rlang
library(dplyr)
#>
#> Attaching package: ‘dplyr’
#> The following objects are masked from ‘package:stats’:
#>
#> filter, lag
#> The following objects are masked from ‘package:base’:
#>
#> intersect, setdiff, setequal, union
library(tidyr)
fit_prophet <- function(data) {
model <- prophet(data)
future <- make_future_dataframe(model, periods = 12, freq = "month")
forecast <- predict(model, future)
return(forecast)
}
grouped_data <- sales %>%
dplyr::mutate(ds = lubridate::floor_date(Date, unit = "month")) %>%
dplyr::group_by(Region, ds) %>%
dplyr::summarise(y = sum(Sales, na.rm = TRUE)) %>%
tidyr::nest()
grouped_data$forecast <- lapply(grouped_data$data, fit_prophet)
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
#> Disabling weekly seasonality. Run prophet with weekly.seasonality=TRUE to override this.
#> Disabling daily seasonality. Run prophet with daily.seasonality=TRUE to override this.
actuals <- grouped_data %>%
dplyr::select(-forecast) %>%
unnest(data)
df <- grouped_data %>%
dplyr::select(-data) %>%
unnest(forecast) %>%
dplyr::select(ds, yhat) %>%
dplyr::left_join(actuals) %>%
dplyr::rename(Actuals = y,
Forecast = yhat)
#> Adding missing grouping variables: `Region`
#> Joining with `by = join_by(Region, ds)`
narrate_forecast(df)
#> $`Current Year Actuals`
#> Actuals for 2021 are equal to 13.5 M
#>
#> $`12 Month Projection`
#> Overall forecast for the next 12 months is 13.8 M.
#>
#> $`Overall increase the next 12 months`
#> Projected increase in the next 12 months is equal to 302.9 K (2.24%).
#>
#> $`2022 YTD vs 2021 YTD`
#> From 2021 YTD to 2022 YTD, Forecast had an increase of 302.88 K (2.2 %, 13.55 M to 13.85 M).
#>
#> $`Forecast change by Month`
#> Months with biggest changes of Forecast are May (141.3 K, 13.62 %, 1 M to 1.2 M), Mar (138 K, 13.94 %, 989.3 K to 1.1 M), Nov (-108.8 K, -6.36 %, 1.7 M to 1.6 M) and Sep (97.4 K, 8.68 %, 1.1 M to 1.2 M).
#>