Working with Data: All the calculations you need t
  • Introduction
  • What is this book?
  • How to Use this Book
  • Document Notes
  • Authors' Thanks
  • Strings
    • Concatenate
    • Split
    • Length
    • Slice
    • Trim
    • Contains
    • Match
    • Replace
    • Upper/Lower Case
  • Numbers
    • Sum
    • Average
    • Roll Ups
    • Round
    • Floor
    • Power
    • Square Root
    • Absolute
  • Dates
    • Date Difference
    • Age
  • Logical
    • Logical Operators
    • If
    • Case / Switch
    • Is Null/Empty
  • General
    • Count
    • Filter
    • Sort
    • Lookup/Join
    • Variables
    • Convert Type
  • Appendix
    • DateTime Parts
    • Helpful Resources
Powered by GitBook
On this page
  • Excel
  • Tableau
  • Alteryx
  • OrgVue
  • tSQL
  • Python

Was this helpful?

  1. Numbers

Roll Ups

PreviousAverageNextRound

Last updated 5 years ago

Was this helpful?

Roll up functions allow you to summarise your data in some way. They are broadly split into two categories.

For tools which handle hierarchical data well, like OrgVue, a Roll up will apply some aggregation (e.g. sum or average) to all of a node's descendants e.g. direct reports. This allows you to get a holistic picture of certain metrics e.g. total annual compensation.

For tools which don't handle hierarchical data so well, Roll ups are normally aggregations, reducing the number of rows and applying some sum/average calculation. Typically this is achieved through some version of a pivot table.

Excel

Excel does not handle hierarchical data well, but you can use a Pivot Table to show various levels of detail and aggregations in a tabular format. Select a range of cells > go to the Insert tab > Pivot Table and follow the instructions in the dialogue.

Tableau

Whilst Tableau has no inbuilt hierarchy rollup function, other options are available.

  • Option 1: Go to Analysis tab > Totals > Show Grand Totals / add Subtotals

  • Option 2: Use 'level of detail' (LOD) calculations to show a different level of aggregation to what is displayed in the view. For more information on LOD calcs, see Tableau's helpful .

Alteryx

Use the Summarize tool (in the Transform palette):

(group by field, sum/average/count by measure).

OrgVue

node.~rollUp!~('‹aggregator›', '‹measure›') OR node.~rollUp!~(‹aggregator›, n=>n.‹measure›)

node.rollUp('sum', 'compensation')
node.rollUp(sum, n=>n.compensation)

tSQL

~GROUP BY ROLLUP!~(‹dimension›) OR ~GROUP BY!~ ‹dimension› WITH ~ROLLUP!~

SELECT PurchaseType, SUM(PurchaseAmount) AS SumPurchaseAmount
FROM SalesData
GROUP BY ROLLUP(PurchaseType)

NB. The ROLLUP function allows SQL Server to create subtotals and grand totals, while it groups data using the GROUP BY clause.

Python

Python does not support a native Rollup function, but you can aggregate using a pivot table operation:

table = pivot_table(df, values=['SalesToday', 'SalesMTD','SalesYTD'],\
rows=['State'], cols=['City'], aggfunc=np.sum, margins=True)

table.stack('City')

Which outputs:

            SalesMTD  SalesToday  SalesYTD
State City                                
stA   All        900          50      2100
      ctA        400          20      1000
      ctB        500          30      1100
stB   All        700          50      2200
      ctC        500          10       900
      ctD        200          40      1300
stC   All        300          30       800
      ctF        300          30       800
All   All       1900         130      5100
      ctA        400          20      1000
      ctB        500          30      1100
      ctC        500          10       900
      ctD        200          40      1300
      ctF        300          30       800

NB. This requires the Pandas library (import pandas as pd). Once downloaded, the above example assumes you have done the following:

  • Imported the numpy library as 'np': import numpy as np

  • Defined a data frame as 'df', e.g.:

import pandas as pd
data = pd.read_csv(r'file_path_of_csv')
df = pd.DataFrame(data)

where 'State', 'City', 'SalesMTD', 'SalesToday', and 'SalesYTD' are all columns.

NB. Including r before the start of the file path forces pandas to treat it as a raw string, avoiding any issues caused by including \ characters.

Thanks to Wes McKinney's on Stack Overflow for the example.

answer
blog