Basic narrative type is descriptive stats, looking into outliers or biggest contributors to the total volumes in the data set. This narratives are quite useful and can help to look deeper into the data set hierarchy.
R
Simple Table
Starting with a simplest table that has only one dimension and one measure. Here overall sales volume as well as outlying Territories will be analyzed.
df_one <- sales %>%
group_by(Region) %>%
summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
arrange(desc(Sales))
kable(df_one)
Region | Sales |
---|---|
NA | 18079736 |
EMEA | 13555413 |
ASPAC | 3919261 |
LATAM | 3236068 |
narrate_descriptive(df_one)
#> $`Total Sales`
#> Total Sales across all Regions is 38790478.4.
#>
#> $`Region by Sales`
#> Outlying Regions by Sales are NA (18079736.4, 46.6 %) and EMEA (13555412.7, 34.9 %).
Summarization
There are multiple summarization/aggregation options for the data
frame, controlled by summarization
argument that can be
sum
, count
or average
sales %>%
narrate_descriptive(
measure = "Sales",
dimensions = "Region",
summarization = "count"
)
#> $`Total Sales`
#> Total Sales across all Regions is 9026.
#>
#> $`Region by Sales`
#> Outlying Regions by Sales are NA (3821, 39.4 %) and EMEA (2883, 29.8 %).
sales %>%
narrate_descriptive(
measure = "Sales",
dimensions = "Region",
summarization = "average"
)
#> $`Average Sales`
#> Average Sales across all Regions is 3879.
#>
#> $`Region by Sales`
#> Outlying Regions by Sales are LATAM (2303.3, -40.6 % vs average Sales) and ASPAC (2398.6, -38.2 % vs average Sales).
Multiple Dimensions
df_two <- sales %>%
filter(Region %in% c("NA", "EMEA")) %>%
group_by(Region, Product) %>%
summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
arrange(desc(Sales))
kable(df_two)
Region | Product | Sales |
---|---|---|
NA | Food & Beverage | 7392821.0 |
EMEA | Food & Beverage | 5265113.2 |
NA | Electronics | 3789132.7 |
EMEA | Electronics | 3182803.4 |
NA | Home | 2165764.5 |
NA | Tools | 2054959.1 |
EMEA | Home | 1633026.4 |
NA | Baby | 1521544.7 |
EMEA | Tools | 1499974.6 |
NA | Clothing | 1155514.4 |
EMEA | Baby | 1146743.8 |
EMEA | Clothing | 827751.3 |
narrate_descriptive(df_two)
#> $`Total Sales`
#> Total Sales across all Regions is 31635149.1.
#>
#> $`Region by Sales`
#> Outlying Region by Sales is NA (18079736.4, 57.2 %).
#>
#> $`NA by Product`
#> In NA, significant Products by Sales are Food & Beverage (7392821, 40.9 %) and Electronics (3789132.7, 21 %).
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (12657934.2, 40 %) and Electronics (6971936.1, 22 %).
Depth
Narration depth can be controlled with narration_depth
argument. To get summary narratives only set
narration_depth = 1
narrate_descriptive(
df_two,
narration_depth = 1
)
#> $`Total Sales`
#> Total Sales across all Regions is 31635149.1.
#>
#> $`Region by Sales`
#> Outlying Region by Sales is NA (18079736.4, 57.2 %).
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (12657934.2, 40 %) and Electronics (6971936.1, 22 %).
Coverage
Key argument for all narratives is coverage
. It is used
to narrate the most important things and avoid simple looping through
all of the dimension levels.
By default coverage is set to 0.5
and this means that narration will stop as soon as cumulative sum
reaches 50 % mark. With increased coverage, additional narrative is
returned.
df_three <- sales %>%
group_by(Product) %>%
summarise(Sales = sum(Sales, na.rm = TRUE)) %>%
arrange(desc(Sales))
df_three %>%
mutate(
Share = round(Sales/sum(Sales)*100, 1),
Cumulative = cumsum(Share)) %>%
kable()
Product | Sales | Share | Cumulative |
---|---|---|---|
Food & Beverage | 15543470 | 40.1 | 40.1 |
Electronics | 8608963 | 22.2 | 62.3 |
Home | 4599371 | 11.9 | 74.2 |
Tools | 4404197 | 11.4 | 85.6 |
Baby | 3256835 | 8.4 | 94.0 |
Clothing | 2377643 | 6.1 | 100.1 |
narrate_descriptive(df_three)
#> $`Total Sales`
#> Total Sales across all Products is 38790478.4.
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %) and Electronics (8608962.8, 22.2 %).
narrate_descriptive(df_three, coverage = 0.7)
#> $`Total Sales`
#> Total Sales across all Products is 38790478.4.
#>
#> $`Product by Sales`
#> Outlying Products by Sales are Food & Beverage (15543469.7, 40.1 %), Electronics (8608962.8, 22.2 %) and Home (4599370.9, 11.9 %).