How to calculate the number of rows in CDAP/DATA Fusion? - google-cloud-data-fusion

How to calculate the number of rows, for example, I use the NullFieldSplitter plugin to divide the data into two parts, and I want to calculate the number of rows for each part. How to calculate it? Someone can take a look and help me, thanks.

You can count number of records using the Group By plugin under the Analytics section. After you split the record, you can pass them to the Group By to calculate the number of records for each of the ports using the Count aggregation method.

Related

Want to SUM all values for a specific date within column NOT sum all values in that column

I want to create a graph which shows the total capacity for each week relative to remaining availability across a series of specific dates. Just now when I attempt this in Power Bi it calculates this correctly for one of the values (remaining availability) but generates a value much higher than expected by manual calculation for the total capacity - instead showing the total for the entire column rather than for each specific date.
Why is Power Bi doing this and how can I solve it?
So far, I have tried generating the graph like this:
(https://i.stack.imgur.com/GV3vk.png)
and as you can see the capacity values are incredibly high they should be 25 days.
The total availability values are correct (ranging from 0 to 5.5 days).
When I create matrices to see the sum breakdown they are correct but it only appears to be that when combined together one of the values changes to the value for the whole column.
If anyone could help me with this issue that would be great! Thanks!

How to compare two averages for the same column?

I want to take an average of the whole data and then filter that data and take another average and then compare the two. Any help is appreciated.
Translating on sample superstore-
The following expression will give average of sales for entire data
{ Avg([sales])}
across all rows. The following expression will, however, give category wise average sales
{FIXED [Category]: avg([sales])}
across all rows again. If you want to apply filters on these calculations add that filter to context but be cautious that the filters will then filter the data used for calculations in both the expressions. If you just want to filter data for viewing purpose and not the calculations dont add the filters to context.

Sum of calculated averages PowerBI

I'm fairly new to PowerBI, I want to calculate sum of averages as measure.
So average is perfectly fine but I couldn't manage to sum them.
average = AVERAGEX(SUMMARIZE(ProductionVolumeData,
ProductionVolumeData[ProductionOrderID],
MDCProductionVolumeData[MachineID],
"sum_volume",
SUM(ProductionVolumeData[Volume])),[sum_volume])
this formula calculates aggregates volume group by production order id and machine id and find mean.
I checked in table, it works for one ProductionOrderID but whenever I add another ProductionOrderID to table it also calculates average. What I want is to sum up averages.
How can I do that?
Thanks in advance

Tableau calculation: I am trying to calculate the percentage of running sum but am unable to create a calculation

I am trying to calculate number of customers which represent 80% of the profit so that I can use it in a calculated field which I can use in a reference line.
This is what I wrote
IIF(RUNNING_SUM([Profit])= (0.8*SUM([Profit])),
COUNTD([Customer Name]),0)
but it gives me error saying
"All fields must be constant or aggregate when using table calculation functions"
The logic is to "Count distinct number of customers which represent 80% of running total profits"
This is meant for a pareto chart, so the values are already sorted in descending order for it to work.
How do I create such calculated field which would give me number of top customers which will represent 80% of the profits?
Let me know if more clarifications are needed.
I think you are looking for a Pareto Chart. This might help:
http://www.theinformationlab.co.uk/2014/08/27/pareto-charts-tableau/
I would leverage the power of Table Calculations, where you can first do running total of profit and then simply calculate percentage of total.
Here is the link to step-by-step tutorial in Tableau10 for Pareto Analysis (80/20 rule):
https://www.tableau.com/learn/tutorials/on-demand/pareto-charts?signin=15df68b66e703787258911e79db040a7.
Hope this helps.

Different Aggregation calculations of a measure using two dimensions in Tableau

It is a Tableau 8.3 Desktop Edition question.
I am trying to aggregate data using two different dimensions. So, I want to aggregate twice: first I want to sum over all the rows and then multiply the results in a cummulative manner (so I can build a graph). How do I do that? Ok, too vague, here follow some more details:
I have a set of historical data. The columns are the date, the rows are the categories.
Easy part: I would like to sum all the rows.
Hard part: Given this those summations I want to build a graph that for each date it shows the product of all the summations from the earlier date till this date.
In another words:
Take the sum of all rows, call it x_i, where i is the date.
For each date i find y_i such that y_i = x_0 * x_1 * ... * x_i (if there is missing data, consider it to be one)
Then show a line graph for the y values versus the date.
I have searched for a solution for this and tried to figure it out by myself, but failed.
Thank you very much for your time and help :)
You need n calculated fields (number of columns you have), and manually do the calculation you need:
y_i = sum(field0)*sum(field1)
Basically because you cannot iterate on columns. For tableau, each column represent a different dimension or measure. So it won't consider that there is a logic order among them, meaning, it won't assume that column A comes before column B. It will assume A and B are different things.
Tableau works better with tables organized as databases. So if you have year columns, you should reorganize your data, eliminate all those columns and create a single field called 'Date', which will identify the value of your measure for that date. Yes, you will have less columns but far more rows. But Tableau works better this way (for very good reasons).
Tableau 9.0 allows you to do that directly. I only watched a demo (it was launched yesterday), but I understand that now there is an option to selected those columns (in the Data Connection tab) and convert them to a database format.
With that done, you can use a PREVIOUS_VALUE function to help you. I'm not with Tableau right now. As soon as I get to it I'll update this with the final answer . Unless you take the lead and discover yourself before that ;)