Working with Data: All the calculations you need t
  • Introduction
  • What is this book?
  • How to Use this Book
  • Document Notes
  • Authors' Thanks
  • Strings
    • Concatenate
    • Split
    • Length
    • Slice
    • Trim
    • Contains
    • Match
    • Replace
    • Upper/Lower Case
  • Numbers
    • Sum
    • Average
    • Roll Ups
    • Round
    • Floor
    • Power
    • Square Root
    • Absolute
  • Dates
    • Date Difference
    • Age
  • Logical
    • Logical Operators
    • If
    • Case / Switch
    • Is Null/Empty
  • General
    • Count
    • Filter
    • Sort
    • Lookup/Join
    • Variables
    • Convert Type
  • Appendix
    • DateTime Parts
    • Helpful Resources
Powered by GitBook
On this page
  • Excel
  • Tableau
  • Alteryx
  • OrgVue
  • tSQL
  • Python

Was this helpful?

  1. General

Lookup/Join

Excel

=~VLOOKUP!~(‹lookup_value›, ‹lookup_table›, ‹column_to_lookup›«, match_type») where ‹column_to_lookup› is the index of a columnn in ‹lookup_table›

=VLOOKUP(A2, C2:D10, 2, FALSE)

NB.

  • Match Types - The default behaviour when omitting «match_type» is that it is set to TRUE. This is rarely the desired behaviour because it will perform an approximate match on the ‹lookup_value› within the first column of ‹lookup_table›. Specifying FALSE for ‹match_type› forces an exact match to occur.

  • Fixing cell ranges - This expression is typically used to supply looked up values for a range of cells, and the quickest way of doing this is to drag the function down to all the cells in question or highlight the range (with the function in the top cell) and use the Ctrl-D shortcut. When doing this, you should change C2:D10 for $C$2:$D$10 to lock the range and avoid it staying relative to the cell in question.

  • Changing lookup 'direction' - One disadvantage of VLOOKUP is that it only works in one direction, where ‹column_to_lookup› has to be positive; in other words, to the right of the first column in the ‹lookup_table›. One way of performing lookups "to the left" of a specified column is to use a combination of ~MATCH!~(‹lookup_value›, ‹lookup_table›«, match_type») (which returns the row number of a value when matched in a range) and ~INDEX!~(‹lookup_range›, ‹lookup_value›) (which returns the value within a range corresponding to a specified row number). So the following is the equivalent of doing a backwards VLOOKUP:

=INDEX(B2:B10, MATCH(A1, C2:C10, 0))
// Match the value of A1 in the range C2:C10 and
// return the corresponding value in B2:B10

Although longer to write, this has the added benefit of being potentially quicker to evaluate (you're only searching two columns as opposed to a whole ‹lookup_table›).

NB. Unlike VLOOKUP, the ‹match_type› options for MATCH() are 1 (match less than), 0 (exact match - preferred), -1 (match greater than).

Tableau

Content pending revision of Tableau 10 materials.

Alteryx

Use one of the following tools in the Join palette:

OrgVue

In OrgVue, lookups are set up through the UI (user interface). 1. Navigate to the property you want to lookup from 2. Set it as a lookup by clicking the eye icon in the Edit Property Dialogue 3. With 'Dataset' lookup selected, pick the dataset you want to act as the ‹lookup_table› (either saved in the Lookups tab or tagged lookup) 4. Apply the changes

This will check each value in that property against the ID field in the lookup dataset (similar to an Excel VLOOKUP) and (unlike Excel) return all corresponding data whenever a match is found.

NB. Matches need to be exact (i.e. no fuzzy matching)

tSQL

‹join_type› ~JOIN!~ ‹tableA› ~ON!~ ‹tableB›.‹lookup_columnA›=‹tableB›.‹lookup_columnB›

SELECT * 
FROM OrdersData
LEFT JOIN ReturnsData
ON OrdersData.OrderID=ReturnsData.ID;

Where the following ‹join_types› are available:

  • INNER JOIN: Returns all rows as long as there is at least one match in both tables. NULL values from either table are not displayed.

  • LEFT JOIN: Returns all rows from the left table, and the matched rows from the right table. When there is no match, the value returned from the right table is NULL.

  • RIGHT JOIN: Returns all rows from the right table, and the matched rows from the left table. When there is no match, the value returned from the left table is NULL.

  • FULL JOIN: Returns all rows from both tables.

Python

~pd.merge!~(left = ‹tableA›, right = ‹tableB›, on = '‹column_name›', how = '«join_type»') Where ‹tableA› and ‹tableB› are dataframe objects:

import pandas as pd

pd.merge(left = orders, right = returns, on = 'order_id', how='left')

This requires the Pandas library (import pandas as pd). Once downloaded, the above example assumes you have defined the orders and returns tables as dataframes, e.g.:

import pandas as pd

orderData = pd.read_excel(r'file_path_of_orders')
orders = pd.DataFrame(orderData)

returnData = pd.read_excel(r'file_path_of_returns')
returns = pd.DataFrame(returnData)

NB.

  • ‹column_name› must be present in both left and right tables

  • The options for «join_type» are left, right, outer and inner, with left being the default behaviour.

  • Including r before the start of the file path forces pandas to treat it as a raw string, avoiding any issues caused by including \ characters.

PreviousSortNextVariables

Last updated 5 years ago

Was this helpful?