Roll Ups
Roll up functions allow you to summarise your data in some way. They are broadly split into two categories.
For tools which handle hierarchical data well, like OrgVue, a Roll up will apply some aggregation (e.g. sum or average) to all of a node's descendants e.g. direct reports. This allows you to get a holistic picture of certain metrics e.g. total annual compensation.
For tools which don't handle hierarchical data so well, Roll ups are normally aggregations, reducing the number of rows and applying some sum/average calculation. Typically this is achieved through some version of a pivot table.
Excel
Excel does not handle hierarchical data well, but you can use a Pivot Table to show various levels of detail and aggregations in a tabular format. Select a range of cells > go to the Insert tab > Pivot Table and follow the instructions in the dialogue.
Tableau
Whilst Tableau has no inbuilt hierarchy rollup function, other options are available.
Option 1: Go to Analysis tab > Totals > Show Grand Totals / add Subtotals
Option 2: Use 'level of detail' (LOD) calculations to show a different level of aggregation to what is displayed in the view. For more information on LOD calcs, see Tableau's helpful blog.
Alteryx
Use the Summarize tool (in the Transform palette):
(group by field, sum/average/count by measure).
OrgVue
node.~rollUp!~('‹aggregator›', '‹measure›') OR node.~rollUp!~(‹aggregator›, n=>n.‹measure›)
node.rollUp('sum', 'compensation')
node.rollUp(sum, n=>n.compensation)
tSQL
~GROUP BY ROLLUP!~(‹dimension›) OR ~GROUP BY!~ ‹dimension› WITH ~ROLLUP!~
SELECT PurchaseType, SUM(PurchaseAmount) AS SumPurchaseAmount
FROM SalesData
GROUP BY ROLLUP(PurchaseType)
NB. The ROLLUP
function allows SQL Server to create subtotals and grand totals, while it groups data using the GROUP BY
clause.
Python
Python does not support a native Rollup function, but you can aggregate using a pivot table operation:
table = pivot_table(df, values=['SalesToday', 'SalesMTD','SalesYTD'],\
rows=['State'], cols=['City'], aggfunc=np.sum, margins=True)
table.stack('City')
Which outputs:
SalesMTD SalesToday SalesYTD
State City
stA All 900 50 2100
ctA 400 20 1000
ctB 500 30 1100
stB All 700 50 2200
ctC 500 10 900
ctD 200 40 1300
stC All 300 30 800
ctF 300 30 800
All All 1900 130 5100
ctA 400 20 1000
ctB 500 30 1100
ctC 500 10 900
ctD 200 40 1300
ctF 300 30 800
Thanks to Wes McKinney's answer on Stack Overflow for the example.
NB. This requires the Pandas library (import pandas as pd
). Once downloaded, the above example assumes you have done the following:
Imported the numpy library as 'np':
import numpy as np
Defined a data frame as 'df', e.g.:
import pandas as pd
data = pd.read_csv(r'file_path_of_csv')
df = pd.DataFrame(data)
where 'State', 'City', 'SalesMTD', 'SalesToday', and 'SalesYTD' are all columns.
NB. Including r
before the start of the file path forces pandas to treat it as a raw string, avoiding any issues caused by including \
characters.
Last updated
Was this helpful?