Roll Ups

Roll up functions allow you to summarise your data in some way. They are broadly split into two categories.

For tools which handle hierarchical data well, like OrgVue, a Roll up will apply some aggregation (e.g. sum or average) to all of a node's descendants e.g. direct reports. This allows you to get a holistic picture of certain metrics e.g. total annual compensation.

For tools which don't handle hierarchical data so well, Roll ups are normally aggregations, reducing the number of rows and applying some sum/average calculation. Typically this is achieved through some version of a pivot table.

Excel

Excel does not handle hierarchical data well, but you can use a Pivot Table to show various levels of detail and aggregations in a tabular format. Select a range of cells > go to the Insert tab > Pivot Table and follow the instructions in the dialogue.

Tableau

Whilst Tableau has no inbuilt hierarchy rollup function, other options are available.

Option 1: Go to Analysis tab > Totals > Show Grand Totals / add Subtotals
Option 2: Use 'level of detail' (LOD) calculations to show a different level of aggregation to what is displayed in the view. For more information on LOD calcs, see Tableau's helpful blog.

Alteryx

Use the Summarize tool (in the Transform palette):

(group by field, sum/average/count by measure).

OrgVue

node.~rollUp!~('‹aggregator›', '‹measure›') OR node.~rollUp!~(‹aggregator›, n=>n.‹measure›)

node.rollUp('sum', 'compensation')

node.rollUp(sum, n=>n.compensation)

tSQL

~GROUP BY ROLLUP!~(‹dimension›) OR ~GROUP BY!~ ‹dimension› WITH ~ROLLUP!~

SELECT PurchaseType, SUM(PurchaseAmount) AS SumPurchaseAmount
FROM SalesData
GROUP BY ROLLUP(PurchaseType)

NB. The ROLLUP function allows SQL Server to create subtotals and grand totals, while it groups data using the GROUP BY clause.

Python

Python does not support a native Rollup function, but you can aggregate using a pivot table operation:

table = pivot_table(df, values=['SalesToday', 'SalesMTD','SalesYTD'],\
rows=['State'], cols=['City'], aggfunc=np.sum, margins=True)

table.stack('City')

Which outputs:

            SalesMTD  SalesToday  SalesYTD
State City                                
stA   All        900          50      2100
      ctA        400          20      1000
      ctB        500          30      1100
stB   All        700          50      2200
      ctC        500          10       900
      ctD        200          40      1300
stC   All        300          30       800
      ctF        300          30       800
All   All       1900         130      5100
      ctA        400          20      1000
      ctB        500          30      1100
      ctC        500          10       900
      ctD        200          40      1300
      ctF        300          30       800

Thanks to Wes McKinney's answer on Stack Overflow for the example.

NB. This requires the Pandas library (import pandas as pd). Once downloaded, the above example assumes you have done the following:

Imported the numpy library as 'np': import numpy as np
Defined a data frame as 'df', e.g.:

import pandas as pd
data = pd.read_csv(r'file_path_of_csv')
df = pd.DataFrame(data)

where 'State', 'City', 'SalesMTD', 'SalesToday', and 'SalesYTD' are all columns.

NB. Including r before the start of the file path forces pandas to treat it as a raw string, avoiding any issues caused by including \ characters.

PreviousAverage NextRound

Last updated 5 years ago

Was this helpful?