Template-based NLG framework for creating text narratives out of data and enhance them using ChatGPT. Demo shiny application showing core package capabilities is deployed on shinyapps.io.
Package is available in both R and Python, with all core features and even syntax being the same or similar. Corresponding classes and data types are used in both languages:
- data.frame vs pandas data frame
- list vs dictionary
- character vector vs list
Installation
For R you can install the development version of narrator from GitHub with:
# install.packages("devtools")
devtools::install_github("denisabd/narrator")
For Python install pynarrator
from pip:
pip3 install pynarrator
R
Basic Use Cases
Simple tables with one or more categorical columns (dimensions) and one measure can be transformed to text using narrate_descriptive()
function.
narrative_one <- sales %>%
narrate_descriptive(
measure = "Sales",
dimensions = c("Region", "Product")
)
narrative_one
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#>
#> $`Region by Sales`
#> Outlying Regions by Sales are NA (18079736.4, 46.6 %) and EMEA (13555412.7, 34.9 %).
#>
#> $`NA by Product`
#> In NA, significant Products by Sales are Food & Beverage (7392821, 40.9 %) and Electronics (3789132.7, 21 %).
#>
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales are Food & Beverage (5265113.2, 38.8 %) and Electronics (3182803.4, 23.5 %).
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).
You can analyze changes over time using narrate_trend()
function:
narrative_two <- sales %>%
narrate_trend(
measure = "Sales",
date = "Date",
dimensions = c("Region", "Product")
)
narrative_two
#> $`2021 YTD vs 2020 YTD`
#> From 2020 YTD to 2021 YTD, Sales had an increase of 1.13 M (9.1 %, 12.42 M to 13.55 M).
#>
#> $`Sales change by Region`
#> Regions with biggest changes of Sales are NA (533.1 K, 9.1 %, 5.9 M to 6.4 M) and EMEA (416.9 K, 9.91 %, 4.2 M to 4.6 M).
#>
#> $`NA by Product`
#> In NA, significant Products by Sales change are Food & Beverage (243.3 K, 9.92 %, 2.5 M to 2.7 M) and Tools (190.5 K, 32.72 %, 582.2 K to 772.7 K).
#>
#> $`EMEA by Product`
#> In EMEA, significant Products by Sales change are Electronics (313.1 K, 36.05 %, 868.6 K to 1.2 M) and Food & Beverage (244.8 K, 15.01 %, 1.6 M to 1.9 M).
#>
#> $`Sales change by Product`
#> Products with biggest changes of Sales are Food & Beverage (535.4 K, 10.63 %, 5 M to 5.6 M) and Electronics (525.9 K, 19.79 %, 2.7 M to 3.2 M).
ChatGPT
narrator
can use ChatGPT API to improve your narratives. To do so you can either set use_chatgpt = TRUE
in any function that creates narrative or use enhance_narrative()
to improve existing narrative output. You can supply list
or character
, function will collapse all text into a sentence and send a request to Chat GPT. Set your token in .Renviron
file as OPENAI_API_KEY
or supply it to a function as openai_api_key
argument.
This functionality requires you to setup the ChatGPT API key and make it accessible from R.
Obtain your ChatGPT API key. You can create an API key by accessing OpenAI API page
Change your
.Renviron
file withusethis::edit_r_environ()
by adding `OPENAI_API_KEY=xx-xxxxxx`
narrative_enhanced <- enhance_narrative(narrative_one)
cat(narrative_enhanced)
Total Sales across all Regions amount to $38,790,478.4. The Outlying Regions, namely North America (NA) and Europe, the Middle East, and Africa (EMEA), account for significant portions of these sales. NA contributes $18,079,736.4, representing 46.6% of the total sales, while EMEA contributes $13,555,412.7, representing 34.9%.
Within the NA region, the top-performing product categories by sales are Food & Beverage, generating $7,392,821 (40.9%), and Electronics, generating $3,789,132.7 (21%). In the EMEA region, the significant product categories by sales are Food & Beverage, generating $5,265,113.2 (38.8%), and Electronics, generating $3,182,803.4 (23.5%).
When considering all regions, the outlying products that contribute significantly to sales are Food & Beverage, generating $15,543,469.7 (40.1%), and Electronics, generating $8,608,962.8 (22.2%).
Translation
Translate you text using translate_narrative()
function, specify language
argument in English:
translation <- translate_narrative(narrative_enhanced, language = "Czech")
cat(translation)
Celkové tržby ve všech regionech činí 38 790 478,4 dolarů. Okrajové regiony, konkrétně Severní Amerika (NA) a Evropa, Střední východ a Afrika (EMEA), představují významnou část těchto tržeb. NA přispívá částkou 18 079 736,4 dolarů, což představuje 46,6% celkových tržeb, zatímco EMEA přispívá částkou 13 555 412,7 dolarů, což představuje 34,9%.
V rámci regionu NA jsou nejúspěšnějšími kategoriemi produktů podle tržeb potraviny a nápoje, které generují 7 392 821 dolarů (40,9%), a elektronika, která generuje 3 789 132,7 dolarů (21%). V regionu EMEA jsou významné kategorie produktů podle tržeb potraviny a nápoje, které generují 5 265 113,2 dolarů (38,8%), a elektronika, která generuje 3 182 803,4 dolarů (23,5%).
Při zohlednění všech regionů jsou okrajové produkty, které významně přispívají k tržbám, potraviny a nápoje, které generují 15 543 469,7 dolarů (40,1%), a elektronika, která generuje 8 608 962,8 dolarů (22,2%).
Summarization
If your output is too verbose you can summarize it with summarize_narrative()
function:
summarization <- summarize_narrative(narrative_enhanced)
cat(summarization)
Total Sales: $38.8M across North America and EMEA regions. NA contributes $18.1M (46.6%) and EMEA contributes $13.6M (34.9%). Top product categories: Food & Beverage and Electronics.
Python
Here are some basic Python examples, for more details visit pynarrator github and pynarrator website
import os
from pynarrator import narrate_descriptive, read_data, enhance_narrative, translate_narrative, summarize_narrative
= read_data() sales
By default narrate_descriptive()
returns a dictionary of narratives with names.
narrate_descriptive(= sales,
df = 'Sales',
measure = ['Region', 'Product'],
dimensions = False,
return_data = 0.5
coverage )
When simplify = True
the output is a list:
narrate_descriptive(= sales,
df = 'Sales',
measure = 'Region',
dimensions = False,
return_data = True,
simplify = 0.5
coverage )
= narrate_descriptive(
narrative_two = sales,
df = 'Sales',
measure = 'Region',
dimensions = False,
return_data = True,
simplify = 0.5
coverage
)
pprint.pprint(narrative_two)
When return_data=True
we get a list of variables calculated inside of the function:
narrate_descriptive(= sales,
df = 'Sales',
measure = ['Region', 'Product'],
dimensions = True,
return_data = True,
simplify = 0.5
coverage )
As all other functions, Chat GPT related calls are to similar to those in R
= enhance_narrative(narrative_one)
narrative_enhanced = translate_narrative(narrative_enhanced, language = "Czech")
translation = summarize_narrative(narrative_enhanced) summarization