I am attempting to create a parameter that allows me to custom filter for Sales, Units, Profit, Orders, Returns, and Return Units. I'm running into an issue when creating the calculated field that calculates return $ and return units. The result I am getting doesn't seem to be accurate.
I have created a Parameter with the above list called [Choose KPI].
I thought I would be able to create calculated fields for [Returned $] and [Returned Units]:
IF [Returned] = 'Yes' THEN [Sales] END
and
IF [Returned] = 'Yes' THEN [Quantity] END
Which would make my calculated field for [Choose KPI]:
CASE [Parameters].[Choose KPI]
WHEN "Sales" THEN SUM([Sales])
WHEN "Units" THEN SUM([Quantity])
WHEN "Profit" THEN SUM([Profit])
WHEN "Orders" THEN COUNT([Order ID])
WHEN "Returns" THEN SUM([Returns $])
WHEN "Return Units" THEN SUM([Returned Units])
END
Rather than returns being associated with product ID, they appear to be linked only with the order ID, which creates duplicate values when there were multiple items on an order (are we assuming all items in the order were returned when [Returned] = 'Yes'?), causing the sum of [Returns $] to be inflated.
How can I create a filter that uses the distinct Order ID to calculate total returns and returned units?
In Tableau, you can use the "Group" option in the "Analysis" menu to create a new field that groups your data by the "Order ID" field. Then, you can use the "COUNTD" function to count the number of unique "Order ID" values, which will give you the total number of returned orders.
To calculate the total returned units, you can create a calculated field that multiplies the quantity of each returned item by the number of unique items returned per order. You can use the COUNTD([Order ID]) and SUM(IF [Returned] = 'Yes' THEN [Quantity] END) in the calculation.
In the [Choose KPI] field, you can use the following calculation:
CASE [Parameters].[Choose KPI]
WHEN "Sales" THEN SUM([Sales])
WHEN "Units" THEN SUM([Quantity])
WHEN "Profit" THEN SUM([Profit])
WHEN "Orders" THEN COUNTD([Order ID])
WHEN "Returns" THEN SUM(IF [Returned] = 'Yes' THEN [Sales] END)
WHEN "Return Units" THEN SUM(IF [Returned] = 'Yes' THEN [Quantity] END) * COUNTD([Order ID])
END
This way, you will be able to filter your data based on the different KPIs you've defined, and the calculations for returned sales and units will be based on the distinct Order ID values.
(Answer by https://chat.openai.com/chat and formatted by me.)
I have a field where I'd like to count the number of instances the field has the max number for that given column. For example, if the max value for a given column is 20, I want to know how many 20's are in that column. I've tried the following formula but I have received a "Cannot mix aggregate and non-aggregate arguments with this function."
IF [Field1] = MAX([Field1])
THEN 1
ELSE 0
END
Try
IF ATTR([Field1]) = MAX(['Field1'])
THEN 1
ELSE 0
END
ATTR() is an aggreation which will allow you to compare aggregate and non aggregate values. As long as the value you are aggregating with ATTR() contains unique values then this won't have an impact on your data.
I have imported a db from a csv with has info about:
country
region
commodity
price
date
(This is the csv: https://www.kaggle.com/jboysen/global-food-prices)
the strings in the csv are ordered in this way:
country 1, region 1.1, commodity X, price, dateA
country 1, region 1.1, commodity X, price, dateB
country 1, region 1.1, commodity Y, price, dateA
country 1, region 1.1, commodity Y, price, dateB
...
country 1, region 1.2, commodity X, price, dateA
country 1, region 1.2, commodity X, price, dateB
country 1, region 1.2, commodity Y, price, dateA
country 1, region 1.2, commodity Y, price, dateB
...
country 2, region 2.1, commodity X, price, dateA
...
I need to show, for each country, for each product, the biggest price.
I wrote:
1) a map with key country+commodity and value price
var map = function() {
emit({country: this.country_name, commodity: this.commodity_name}, {price: this.price});
};
2) a reduce that scans the prices related to a key and check what's the highest price
var reduce = function(key, values) {
var maxPrice = 0.0;
values.forEach(function(doc) {
var thisPrice = parseFloat(doc.price);
if( typeof doc.price != "undefined") {
if (thisPrice > maxPrice) {
maxPrice = thisPrice;
}
}
});
return {max_price: maxPrice};
};
3) I send the output of a map reduce to a collection "mr"
db.prices.mapReduce(map, reduce, {out: "mr"});
PROBLEM:
For example, if I open the csv and manually order by:
country (increasing order)
commodity (increasing order)
price (decreasing order)
I can check that (to give an example of data) in Afghanistan the highest price for the commodity Bread is 65.25
When I check the M-R though, it results 0 for max price of Bread in Afghanistan.
WHAT HAPPENS:
There are 10 regions in the csv in which Bread is logged for Afghanistan.
I've added, on the last line of the reduce:
print("reduce with key: " + key.country + ", " + key.commodity + "; max price: " + maxPrice);
Theoretically, if I search in mongodb log, I should only find ONE entrance with "reduce with key: Afghanistan, Bread; max price: ???".
Instead I see TEN lines (same numbers of the regions), each one with a different max price.
The last one has "max price 0".
MY HYPOTESIS:
It seems that, after the emit, when the reduce is called, instead of looking for ALL k-v pairs with the same key, it considers sub-groups that are in promixity.
So, recalling my starting example on the csv structure:
until the reduce scans emit outputs related to "afghanista, region 1, bread", it does a reduce on themm
then it does a reduce on the outputs related to "afghanistan, region 1, commodityX"
then it does another reduce on the outputs related to "afghanistan, region 2, bread" (instead of reducing ALL the k-v pairs with afghanistan+bread in a single reduce)
Do I have to do a re-reduce to work on all the partial reduce jobs?
I've managed to solve this.
MongoDB doesn't necessarily do the reducing of all k-v pairs with the same key in one go.
It can happen that (as in this case) MongoDB will perform a reduce on a subset of k-v pairs related to a specific key, and then it will send the output of this first reduce when it will do a second reduce on another subset related to the same key.
My code didn't work because:
MongoDB performed a reduce on a subset of k-v pairs related to the key "Afghanistan, Bread", with a variable in output named "maxPrice"
MongoDB would proceed to reduce other subsets
MongoDB, when faced with another subset of "Afghanistan, Bread", would take the output of the first reduce, and use it as a value
The output of a reduce is named "maxPrice", but the other values are named "price"
Since I ask for the value "doc.price", when I scan the doc that contains "maxPrice", it gets ignored
There are 2 approaches to solve this:
1) You use the same name for the reduce output variable as the emit output value
2) You index the properties chosen as key, and you use the "sort" option on mapReduce() so that all k-v pairs related to a key get reduced in one go
The second approach is if you don't want to give up using a different name for the name of the reduce output (plus it has better performance since it only does a single reduce per key).
I have created two groups on a customer level (all with distinct customer IDs) based on some criterias. But I am having difficulties grouping these on a group level (group ID). The data structure is a follows: a customer has one customer ID and a group ID. The customer ID is distinct but the group ID is not; i.e. multiple customers (customer IDs) are part of a group and therefore have the same group ID.
My tableau code looks something like this:
IF [Sales] >= 200000 and [Category] = 'A'
OR [CAC] <= 10000 and [Category] = 'A'
THEN 'Good customer'
ELSE 'Bad customer'
END
The above code gives me the grouping on a customer level. However, I want to see the group level, i.e. if just one customer from a group is a 'Good customer' then the entire group should be classified as a 'Good customer'. This means that if just one customer from the group is classied as a 'Good customer' then all the [Sales] and [CAC] of the customers within the particular group should be summed up on a group level and displayed under 'Good customer' on a group level instead of on customer level.
Try this:
Create a field [GoodCustomer] with this formula:
IF [Sales] >= 200000 AND [Category] = 'A'
OR [CAC] <= 10000 AND [Category] = 'A'
THEN 1
ELSE 0
END
This is your condition but assigns a numeric value to it (1 = good customer, 0 = bad customer)
Create a field [GoodGroup] with this formula:
{fixed [Group]: IIF(SUM([GoodCustomer]) > 0, True, False)}
For each group this checks if the sum of [GoodCustomer] is greater 0 (which means that at least one customer was good). If this is true it sets it to True (or use 'Good Group') if it is false, set it to False (or 'Bad Group')
This should give you what you described above.
I have a list of RichPipes with the following fields:
name: String
joinTime: Long
value: Int
I want to join them sequentially using reduce. When joining the RichPipes I only want to retain one field, value, and I want it to contain the max value from the joint RichPipes. How can I do it?