Learning DAX by experimentation

Share The Knowledge

1st: a vague requirement that needs clarifying. 2nd: a puzzle to solve via a journey of DAX discovery!

?Requirement?

What’s the average discounted sales per customer?

It might sound clear but once we look at the data it becomes obvious that this question could be interpreted in different ways. Let’s look at the data.

Set-up

My pbix file

Sales Table

Each row is a sale to a customer.

7 rows in Sales table

Customer Table

Each customer’s discount amount.

Customer discount value

Relationship

One to many (Customers to Sales). A customer can have 1+ rows in Sales.

Interpretation

Now I’ve seen the data but how do I interpret the requirement?

What’s the average discounted sales per customer?

I needed to clearly understand it before creating a measure.

Asking Questions

I reduced it down to two questions:

  • How should it calculate at the row level?
  • How should it calculate at the total level?

Excel Demo

To explain different Customer row & Total row possibilities.

Are one of these row and total amounts correct?

I listed 4 possibilities:

  1. METHOD A & total: sum. Or simply: sum & sum
  2. METHOD A & total: average. Or simply: sum & average
  3. METHOD B & total: sum. Or simply: average & sum
  4. METHOD B & total: average. Or simply: average & average

Some seem to make more sense than others but the requestor decides.

Clear Requirement!

The questions helped and we ended up with this:

for each Customer sum the discount sales. In total row show average of Customer amounts above.

Now it’s time to solve this!

Solution

I’ll show the final visual and measure. Then I’ll dive into why one measure doesn’t work and why 2 measures do work.

Final Visual

This is what I need to build:

Row = sum, total = average of above

Final Measure

Both measures work but in different ways:

Solution YES #1 =
AVERAGEX(
  Customers,
  SUMX(RELATEDTABLE(Sales), Sales[Sale Amt] * (1 - Customers[Discount])))
Solution YES #2 = 
VAR RowCalc =
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))
VAR TotalCalc =
         AVERAGEX(Values(Customers[Customer]), [Total Discount Sales])
RETURN
IF( HASONEVALUE(Customers[Customer]), RowCalc, TotalCalc)

Solution Journey

I’ll walk through my journey. I got stuck a few times but l learned so much!

Failed Attempt #1

Remember, we want to create this:

row = sum, total = average above
Solution NO #1 = 
AVERAGEX(Customers,
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))

AVERAGEX ( <Table>, <Expression> )

I thought it would work in both row level & total level.

  • Row: calculate <Expression> for the current Customer
  • Total: iterate each row in <Table>, calculate <Expression>, average results

Below ‘Solution NO #1’ doesn’t work at total level. Why does it sum when I’m using AVERAGEX to average $274.4 $720.0 $123.5 ?

Total row doesn’t average the values above

In Excel I can evaluate part of a formula (F9 key) to see what’s inside. I can’t do that in Power BI but I thought of this:

Look Inside Measure

I changed AVERAGEX to SUMX to test an idea.

Solution NO #1b verify = 
SUMX(Customers,
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))

Look at the $3353.7 total below!!!

$3353.7 Total?!

$3353.7 = $1117.9 X 3 my measure isn’t iterating over each customer. It’s the total 3 times! There’s a disconnect between <Table> and <Expression>.

To prove it I created these:

Solution NO #1c verify = 
MINX(Customers,
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))
Solution NO #1d verify = 
MAXX(Customers,
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))

As I suspected, MINX and MAXX both produce $1117.9 !!

$1117.9 is the only number inside the failed measures!

To be extra sure that $1117.9 is there 3 times I created this:

Solution NO #1e verify = 
COUNTX(Customers,
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))

Total row for measure ‘Solution NO #1e verify‘ counts 3 numbers!

Solution #1

Inside SUMX I removed RELATED from the <Expression> and added RELATEDTABLE to the <Table> part.

Solution YES #1 = 
AVERAGEX(Customers,
         SUMX(RELATEDTABLE(Sales),Sales[Sale Amt] * (1- Customers[Discount])))
It works for row (sum) & total (average) !

Remember the disconnect? I want to stay inside table Customers to do my X expression. RELATEDTABLE pulls the data from Sales up to Customers.

X functions (SUMX, MAXX) iterate a column in the same table. Sales column [Sales Amt] is outside of Customers. RELATEDTABLE goes down to Sales, gets it, and performs a context transition (link).

Customers[Discount] is already in table Customer.

How can I be sure? What numbers are inside the measure?

Look Inside Measure

Each measure below extracts a different number to prove what’s inside!

Solution YES #1c verify = 
MINX(Customers,
     SUMX(RELATEDTABLE(Sales),Sales[Sale Amt] * (1- Customers[Discount])))

Above, measure ‘Solution YES #1c verify’ returns $123.50 (MINX).

Solution YES #1e verify = 
MEDIANX(Customers,
        SUMX(RELATEDTABLE(Sales),Sales[Sale Amt] * (1- Customers[Discount])))

Above, measure ‘Solution YES #1e verify’ returns $274.40 (MEDIANX).

Solution YES #1d verify = 
MAXX(Customers,
     SUMX(RELATEDTABLE(Sales),Sales[Sale Amt] * (1- Customers[Discount])))

Above, measure ‘Solution YES #1d verify’ returns $720.00 (MAXX).

I see all 3 Customer numbers in the Total row: $123.5 $720.0 $274.4

Average of $123.5 & $274.4 & $720.0 is $372.63! (Solution YES #1)

Solution #2

It wasn’t easy to create a measure to calculate differently at row and total levels. Why not use function HASONEVALUE to distinguish between row and total and then do an simpler calculations?

Solution YES #2 = 
VAR RowCalc =
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))
VAR TotalCalc =
         AVERAGEX(Values(Customers[Customer]), [Total Discount Sales])
RETURN
IF( HASONEVALUE(Customers[Customer]), RowCalc, TotalCalc)

Where am I?

IF and HASONEVALUE functions determine if the row has a Customer (runs RowCalc) or if it’s the total row (runs TotalCalc).

VAR RowCalc

If there’s a Customer in the row then variable RowCalc iterates table Sales to multiply: [Sale Amt] X Customers[Discount]. RELATED gets the [Discount] value for current row [Customer]. Customer in viz is the filter context so SUMX only iterates current customer’s rows.

VAR TotalCalc

TotalCalc uses VALUES to create a table of customers to iterate. I don’t want to average the original column but average [Total Discount Sales] per Customer.

Be Careful

If you copy the DAX for measure [Total Discount Sales] and paste it inside Solution YES #2 (to replace external reference) it won’t work. Why?

When referencing an external measure an invisible CALCULATE is wrapped around it. If the DAX is used directly inside the measure then wrap it with a CALCULATE to get context transition. Compare two measures below:

Solution YES #2b NO = 
//this measure no longer works
VAR RowCalc =
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))
VAR TotalCalc =
         AVERAGEX(Values(Customers[Customer]), SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))
RETURN
IF( HASONEVALUE(Customers[Customer]), RowCalc, TotalCalc)

Above measure no longer works but measure below works:

Solution YES #2c YES = 
//by adding the CALCULATE function this measure works and no longer relies on external measure
VAR RowCalc =
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))
VAR TotalCalc =
         AVERAGEX(Values(Customers[Customer]),
                  CALCULATE(SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))))
RETURN
IF( HASONEVALUE(Customers[Customer]), RowCalc, TotalCalc)

I continued to play with context transition. Can I remove TotalCalc’s CALCULATE and replace it with CALCULATETABLE? It works!

Solution YES #2d YES = 
VAR RowCalc =
         SUMX(Sales,Sales[Sale Amt] * (1- RELATED(Customers[Discount])))
VAR TotalCalc =
         AVERAGEX(Values(Customers[Customer]),
                  SUMX(CALCULATETABLE(Sales),Sales[Sale Amt] * (1- RELATED(Customers[Discount]))))
RETURN
IF( HASONEVALUE(Customers[Customer]), RowCalc, TotalCalc)

So, context transition must occur but there’s more than one way to do it!

DAX Discovery Journey

This was exhausting but rewarding as I learned why my initial solution wasn’t working, how to fix it and prove how it worked.

Great advice from Adam Saxton re: learning DAX (Youtube link)

The struggle is real. And this is something that Patrick and I had to go through as well. Everyone that’s learning this has to go through that knowledge ramp and you are not alone.

My nephew sent me an amazing book called ‘Burn Math Class: And Reinvent Mathematics for Yourself’. So many great quotes. Here’s one

…an approach that requires zero memorization, encourages experimentation and failure, never asks the reader to accept anything we have not created ourselves, avoids fancy names that hide the simplicity of the idea, and presents mathematics like the adventure it is…

Learning DAX, or anything, does require a lot of experimenting and failing. If I keep at it I’ll have more eureka moments when I understand a challenging concept.

Demo Data

The relationship between Sales and Customers would normally use an ID number (not text name field). I kept it simple for this demo.

This time I used a simpler DAX approach via function DATATABLE:

Sales = 
VAR tablesales = 
DATATABLE (
    "Customer", STRING,
    "Sale Amt", CURRENCY,
    {
        { "Alberto", 100 },
        { "Alberto", 180 },
        { "Eugenia",  20 },
        { "Eugenia",  50 },
        { "Eugenia",  60 },
        { "Clara",  300 }, 
        { "Clara",  500 }  
    }
)  
RETURN
tablesales

Boring but this is all I needed. Some richer ways to create sample data:

Another down side of a light demo is that we don’t get to see how efficiently it runs. I need to practice with larger sample data-sets.


Share The Knowledge

Leave a Reply

Your email address will not be published. Required fields are marked *