Using tableau to compute entropy from a estimated probability distribution - tableau-api

I have the following situation: after sniffing multiple packets from a WLAN, I consider the random variable X with values given by the protocol number and probabilities given by the number of packets with that protocol number over the total of the packets (that is, I poorly estimate the probability that a given protocol occurs on the network).
This gives me something like this:
These values are calculated by using
COUNT([Protocol]) / TOTAL(COUNT([Protocol]))
Now, however, I want to convert these values into information content (which is just applying -LOG(, 2) to the expression above) and add a line showing the entropy of the "source".
If I were to do this, an easy way would be to just do:
SUM(EXP((-1) * x) * x)
Where x is -LOG(,2) of the first expression. However, Tableau complains about this being a double aggregate. Is there any other way to calculate this?

Change SUM to WINDOW_SUM to create a table calculation, and set "compute using" for that calculation to your protocol number field
If you want to understand more, study how table calculations work. In a nutshell, the sum() calculation is performed by the external data source or database in response to the query sent by Tableau. Then the aggregated query result is returned to Tableau as a summary table. Table calcs such as Window_Sum() operate on that summary table.
The "compute using" directive instructs Tableau on how to partition the summary table; that is to define the scope of the window used to compute Window_Sum()

I find the following sentence very useful for understanding why we are interested in "Entropy in Tableau":
"The basic intuition behind information theory is that learning that an unlikely event has occurred is more informative than learning that a likely even has occurred."
Page 73, Deep Learning, 2016

Related

Row level average calculation tableau

I need to calculate average in tableau..one equation was sum(var A* varB)/sum(varB)..I have converted it into row level equation as avg((varAvarB)/varB) and it is working fine.but now I have another equation i.e sum(varAvarB)/total(sum(varB)).could some one help me how to convert this into row level average calculation.
This is a classic scenario where you can leverage Level of Detail expressions in Tableau
read here: https://help.tableau.com/current/pro/desktop/en-us/calculations_calculatedfields_lod_overview.htm
Your expressions would emerge like : AVG([{FIXED [Segment]], [Category]] : SUM([Sales]])}])
You will have greater control over row-level operations using LoD
Please read the link and try it

How to pass a vector from tableau to R

I have a need to pass a vector of arguments to Rserve from tableau. Specifically, I am using IRR calculations in R (on Rserve), and i want to pass vector of cash-flows that are as columns in my table (instead of rows/measure). So, i want to collect all those CF in a vector and pass it on to Rserve. Passing them one at a time slows down IO.
SCRIPT_REAL("r_func(c(.arg1, .arg2, .arg3))",sum(cf1), sum(cf2), sum(cf3))
cf1..cfn are cashflows corresponding to various periods. Above code works well when cf are few but takes a long time when i have few hundereds. Further, time spent is not in calculation but IO when communicating with remote Rserve. If i have a local Rserve, this calculation happens under few seconds while on remote, it takes well over a minute.
Also, want to point out that tableau / Rserve, set one argument after another and that takes time. My expectation is that once i have a vector, it would be just 1 transfer and setting of arguments, and therefore this should speed up
The first step in understanding how Tableau interacts with R or Python, is understanding how Tableau's table calcs work.
Tableau Script_XXX() functions are table calculations which means that you invoke them on a vector of aggregate query results and the corresponding R or Python code needs to return a vector usually of the same size. (I think you may be able to return a scalar or smaller vector which gets replicated to appear like a vector of the same size as the argument -- but not certain)
You can control how your data is partitioned into vectors, and also the ordering of data in the vectors, by editing the table calc to specify the partitioning and addressing for that calc.
Partitioning determines how your aggregate query results are broken up into vectors for calculation purposes. Addressing determines how the elements of each vector are ordered. You can either do that based on the physical layout of the table structure, or (better) based on the specific dimensions.
See the Tableau on-line help for table calcs for more info, and look online training videos from Tableau or blog entries (especially from anyone named Bora)
One way to test your understanding of these concepts is create a Tableau table (i.e., a viz with a mark type of text) with several dimensions on row and column shelves. Then create calculated fields for INDEX() and SIZE() and display them on text. Finally, change the partitioning and addressing in different ways by editing those table calcs. Try several different permutations. When you can confidently predict what those functions will produce for different settings, then you're ready to do more complex tasks - such as talking to R.
It is also instructive to experiment with FIRST(), LAST(), LOOKUP(), WINDOW_SUM() etc -- and finally dig into PREVIOUS_VALUE(). Warning, PREVIOUS_VALUE() is a bit odd, and does not behave the way you probably assume it does. Still, it is a useful technique that can implement a recursive calculation, and is about as close to a for loop as Tableau gets.

Tableau calculation: I am trying to calculate the percentage of running sum but am unable to create a calculation

I am trying to calculate number of customers which represent 80% of the profit so that I can use it in a calculated field which I can use in a reference line.
This is what I wrote
IIF(RUNNING_SUM([Profit])= (0.8*SUM([Profit])),
COUNTD([Customer Name]),0)
but it gives me error saying
"All fields must be constant or aggregate when using table calculation functions"
The logic is to "Count distinct number of customers which represent 80% of running total profits"
This is meant for a pareto chart, so the values are already sorted in descending order for it to work.
How do I create such calculated field which would give me number of top customers which will represent 80% of the profits?
Let me know if more clarifications are needed.
I think you are looking for a Pareto Chart. This might help:
http://www.theinformationlab.co.uk/2014/08/27/pareto-charts-tableau/
I would leverage the power of Table Calculations, where you can first do running total of profit and then simply calculate percentage of total.
Here is the link to step-by-step tutorial in Tableau10 for Pareto Analysis (80/20 rule):
https://www.tableau.com/learn/tutorials/on-demand/pareto-charts?signin=15df68b66e703787258911e79db040a7.
Hope this helps.

Is Matlab incorrect for mnrfit?

It seems Matlab is giving incorrect results for multinomial logistic regression.
In their example documentation using Fisher's Iris dataset [link], they give coefficients for the model which can be used on the same data set itself to get the modeled probabilities.
load fisheriris
sp = categorical(species);
[B,dev,stats] = mnrfit(meas,sp);
PHAT=mnrval(B,meas);
However, none of the expected value aggregates match the population aggregates which is a requirement for a MaxEnt classifer (See slide 35 [here], or Eq 14 [here], or Agresti "Categorical Data Analysis" pg 298, etc.)
For example
>> sum(PHAT)
>> 49.9828 49.8715 50.1456
should all equal 50 (population values), likewise for other aggregations
If the parameters
B=[36.9450 42.6378
12.2641 2.4653
14.4401 6.6809
-30.5885 -9.4294
-39.3232 -18.2862]
were used instead then all aggregated sufficient statistics match.
Additionally it seems odd that Matlab is solving it with likelihoods, which can produce an error,
Warning: Maximum likelihood estimation did not converge. Iteration
limit exceeded. You may need to merge categories to increase observed
counts
where the only requirement, proved by MLE consideration, is that the expected values match and no likelihood evaluation is needed.
It would be a nice feature that if instead of true classes are given we can give an option for including just the aggregate information.
Submitted a technical error review within Mathworks website. Their reply:
Hello [----],
I am writing in reference to your Technical Support Case #01820504
regarding 'mnrfit'.
Thanks a lot for your patience and reporting this issue. This appears
to be unexpected behavior. It appears to be related to an existing
issue we have in our records, that "mnrfit" does not give correct
maximum likelihood estimates in certain cases. Since the "mnrfit"
function is not finding the maximum likelihood estimates for the
coefficients, we calculated the actual MLEs. When we use these
estimates, we get the desired result of all 50s in this case.
The issue is that, for this particular dataset in our example, the
classes can be separated perfectly. This means that the logistic
function, in order to get exact zero or one probabilities, needs to
have infinite coefficients. The "mnrfit" function carries out an
iterative procedure with the coefficients getting larger, but it stops
at a point where the results have the issue that you have found. We
certainly agree that "mnrfit" could be made to do better. Our
development team is working on it.
At this stage, I am not able to suggest a workaround other than to
write a custom implementation as my colleague and I had tried. For
now, I will be closing this request as I have already forwarded it to
our records. However, if you have any additional questions related to
this case, please do not hesitate to reach me.
Sincerely,
[----]
MathWorks Technical Support Department

Different Aggregation calculations of a measure using two dimensions in Tableau

It is a Tableau 8.3 Desktop Edition question.
I am trying to aggregate data using two different dimensions. So, I want to aggregate twice: first I want to sum over all the rows and then multiply the results in a cummulative manner (so I can build a graph). How do I do that? Ok, too vague, here follow some more details:
I have a set of historical data. The columns are the date, the rows are the categories.
Easy part: I would like to sum all the rows.
Hard part: Given this those summations I want to build a graph that for each date it shows the product of all the summations from the earlier date till this date.
In another words:
Take the sum of all rows, call it x_i, where i is the date.
For each date i find y_i such that y_i = x_0 * x_1 * ... * x_i (if there is missing data, consider it to be one)
Then show a line graph for the y values versus the date.
I have searched for a solution for this and tried to figure it out by myself, but failed.
Thank you very much for your time and help :)
You need n calculated fields (number of columns you have), and manually do the calculation you need:
y_i = sum(field0)*sum(field1)
Basically because you cannot iterate on columns. For tableau, each column represent a different dimension or measure. So it won't consider that there is a logic order among them, meaning, it won't assume that column A comes before column B. It will assume A and B are different things.
Tableau works better with tables organized as databases. So if you have year columns, you should reorganize your data, eliminate all those columns and create a single field called 'Date', which will identify the value of your measure for that date. Yes, you will have less columns but far more rows. But Tableau works better this way (for very good reasons).
Tableau 9.0 allows you to do that directly. I only watched a demo (it was launched yesterday), but I understand that now there is an option to selected those columns (in the Data Connection tab) and convert them to a database format.
With that done, you can use a PREVIOUS_VALUE function to help you. I'm not with Tableau right now. As soon as I get to it I'll update this with the final answer . Unless you take the lead and discover yourself before that ;)