Get the distinct count of applications used by users who used application A in Tableau - tableau-api

I have a data set of users and the applications they used. For each application, I want to compute the total number of users who used only this application, who used this application + another application, +2 other applications,... At the end I need to get a treemap with where each square represents the total number of users who used only the application, the application + 1 application, + 2 applications,......and also compute the overlap percentage, that is how many users used only this application vs how many users used this application with another application. I could compute these metrics if I don't add the application constraint, that is compute the number of users who used one application, 2 applications, 3 applications, ..... and the overlap percentage was the percentage of users who used more than one application.
To do so,
I created a calculated field to count the total number of applications per user
Then I used this field to create bins, which gave me the total number of users who used 1 app, 2 apps, ..... N apps
I then ended up creating a treemap with those bins where the size of each square is the countD(users).
How to do the same but instead of only having (All) applications, be able to do it with only one application and their companion applications. I need an application selector, where each time I change the application, the treemap changes.

As you have not given sample data, I am proposing a solution assuming -
No rows are duplicate i.e. every user is associated with any unique application not more than once.
application types may not vary more than 15-20 (in that situation the solution may not work given the calculation range of tableau)
I have taken a sample data as follows-
Step-1 Create calculated field (CF) named as CF as -
// map every application with a number
CASE [Apl id]
WHEN 'apl1' THEN 1
WHEN 'apl2' THEN 2
WHEN 'apl3' THEN 3
WHEN 'apl4' THEN 4
WHEN 'apl5' THEN 5
END
You may add as many values as you have (all unique values of apl)
Step-2 add another CF CF1 as
// binary representation of each application used
{FIXED [User id]: sum(10^([CF]-1))}
Step-3 Create a parameter as
Step-4 add another CF CF2 as
// checking which user uses selected application mandatorily
If
LEFT(RIGHT(RIGHT('00000'+STR([CF1]),5), [Parameter 1]),1) = '1'
then 'Y' else 'N' end
Now you may build your view as shown below
Selecting parameter value will filter out those records where user is at least associated with selected value. Distinct count value will give you count of total applications used by these users. Now you can proceed to build your treemap. Please tell if any other help/explanation in the solution is required.
Note-1. This solution is proposed on binary logic (and hence the restriction). If you will see sum(CF1) values will give you a binary representation of applications used by that user where rightmost placeholder denotes that user is using apl1 yes (1) or not (0); and similarly onwards right to left.
Note-2: While calculating CF2 I have padded the CF1 values with leading zeros at most 5. This you have to change according to total count of your applications.

Related

Is is possible limit the number of rows in the output of a Dataprep flow?

I'm using Dataprep on GCP to wrangle a large file with a billion rows. I would like to limit the number of rows in the output of the flow, as I am prototyping a Machine Learning model.
Let's say I would like to keep one million rows out of the original billion. Is this possible to do this with Dataprep? I have reviewed the documentation of sampling, but that only applies to the input of the Transformer tool and not the outcome of the process.
You can do this, but it does take a bit of extra work in your Recipe--set up a formula in a new column using something like RANDBETWEEN to give you a random integer output between 1 and 1,000 (in this million-to-billion case). From there, you can filter rows based on whatever random integer between 1 and 1,000 as what you'll keep, and then your output will only have your randomized subset. Just have your last part of the recipe remove this temporary column.
So indeed there are 2 approaches to this.
As Courtney Grimes said, you can use one of the 2 functions that create random-number out of a range.
randbetween :
rand :
These methods can be used to slice an "even" portion of your data. As suggested, a randbetween(1,1000) , then pick 1<x<1000 to filter, because it's 1\1000 of data (million out of a billion).
Alternatively, if you just want to have million records in your output, but either
Don't want to rely on the knowledge of the size of the entire table
just want the first million rows, agnostic to how many rows there are -
You can just use 2 of these 3 row filtering methods: (top rows\ range)
P.S
By understanding the $sourcerownumber metadata parameter (can read in-product documentation), you can filter\keep a portion of the data (as per the first scenario) in 1 step (AKA without creating an additional column.
BTW, an easy way of "discovery" of how-to's in Trifacta would be to just type what you're looking for in the "search-transtormation" pane (accessed via ctrl-k). By searching "filter", you'll get most of the relevant options for your problem.
Cheers!

Lookup DB value from anouther table and change slightly

I am trying to create a job management system using FileMaker Pro 14 advanced and FM Starting Point 4.6.
I have only just started looking at it so am pretty much a noob with it. Basically FM Starting Point has Projects and Estimates, Each Project can have multiple Estimates. What I have done is set it so that you have to assign a Project to a Estimate before it will save. I have used the Looked-up value in the database management to fill the Estimate ID with The Project ID that has been assigned.
Project numbers start with an A and 6 digits so for example A120000. What I want to do is change the A to a Q when the Project ID is set as Estimate ID.
Can any one point me towards the right methods.
Have a look at calculated fields. I would probably have one field for the digits and one for the flag, then concatenate these in a calculated field for display.
Then set up a script that triggers on change in a field, which sets the flag to the other value.
More info about calculated fields: http://www.filemaker.com/help/12/fmp/html/create_db.8.17.html
More info about script triggers: http://www.filemaker.com/help/13/fmp/en/html/create_layout.9.61.html

Calculated Fields and increments

I am building a system using Starting Point and need help with calculated fields.
Basically when an estimate is created it takes the same ID as the project its linked to with a "-1" 1 being incremental value. So if the increment -1 exists the next estimate for that project would be -2 and so on.
So for example
Project Id: 120000
First Estimate: 120000-1
Second Estimate: 120000-2
I have found out how to add a hyphen and number after the project ID (the stored as estimate ID) like so id_project & "-" & 1 but I have no idea using filemaker how to use calculated fields to look and see if 120000-2 is a thing and if it is make it 120000-3
Any help greatly appreciated
I'm assuming you have at least a relationship between Projects and Estimates. Perhaps something like this (without the Estimates_self table occurrence, which I'll get to in a moment):
If you're in the Projects context (on a layout linked to the Projects table), you can get a count from there using something like Count ( Estimates::id ).
If you want to make this happen from the context of Estimates, create a self-join as shown above, using the project foreign key as the match field. Then you can use Count ( Estimates_self::id ).
Finally, an option without any relationship graph changes would be to use ExecuteSQL:
ExecuteSQL (
"SELECT COUNT(*) FROM Estimates WHERE project_id = ?" ;
"" ; "" ;
Estimates::project_id
)
All of these will give you the number of estimates that a given project has. Add one to that and you have your suffix number for the new estimate.
Count the related estimates based on a parent Project. This will give you the Estimate number after dash

Making a Histogram in Tableau

I work for a software company, and I am working with a database that tracks certain events that occur in one of our games. Every time one of the tracked events occurs, a text entry in the “Event Type” field specifies what kind of event it is – “User Login,” “Enemy Killed,” “Player Death,” etc. Another field, “Session ID,” assigns a unique ID number to each individual game session. So if a user logs in to the game, kills eight enemies, and then logs out again, each of those Enemy Killed events will have the same Session ID.
I’m trying to make a histogram showing the number of sessions that have x number of Enemy Killed events. How do I go about this? I’m a raw beginner at Tableau, so if you can dumb down your answer to the explain-like-I’m-five level that would be great.
Tableau 9.0 has been launched, and your problem can be solved entirely inside Tableau.
What you need is to understand the Level of Detail calculations. It will look like this:
{ FIXED [Session ID] : COUNT( IF [Event Type] = 'Enemy Killed'
THEN 1
END )
}
This will calculated how many kills each session had. You can create BINS with this field, and count how many sessions there are (COUNTD([Session ID]))
Well, my answer will echo with many of my last answers. Your database is not ready to do that analysis.
Basically what your database should look like is:
SessionId EnemiesKilled
1234 13
So you could create a histogram on EnemiesKilled.
To do the histograms, you can create BINs (right click on field, Create Bins), but I find it very limited, as it only creates BINS of the same width. What I usually do is a bunch of IF and ELSEIF to manually create the BINs, to better suit my purposes.
To convert your db to the format I explained, it's better if you can manipulate it outside Tableau and connect to it directly. If it's SQL, a GROUPBY Session ID, and COUNT of EnemyKilled Events should work (not exactly like this but that's the idea).
To do it on Tableau, you can drag SessionId (to either Marks or Rows, for this purpose of creating a table I usually put everything on Marks and choose Bar chart, so Tableau won't waste time plotting anything) and a calculated field like:
SUM(
IF EventType = "Enemy Killed"
THEN 1
ELSE 0
END
)
Then export the data to a csv or mdb and then connect to it

Crystal reports - Group total

I have a report that I've written and I understand how to create running totals and such, but need help creating a custom evaluation formula.
I have two levels of groups, first group is based upon a certain user, the next group is based upon transactions that user has been involved in. I have details hidden, and am only interested in the totals for a particular activity. This is working great, and totals are working properly but the problem is, each activity has a 'line number', which essentially can be the same as another activity (ie: two activities can have lines 1, 2, 3 contained within), so doing a distinctive total based upon a set of data isn't accurate because I only want it to be distinct based upon each individual recordset, and not globally.
The example is below... if I do a count on each record for this dataset, it comes out to 18 because there are duplicate line numbers on each... but if I do distinct, it only comes to 9 because of duplicate line numbers across multiple actives.
I guess what I need to know is how I can take the totals per detail group, and have them total up in my second footer properly. I assume it's going to take me compiling together a string including the activity number and line number, and then comparing them?
Here is an example of the data contained within the total groupings:
I figured this out on my own... turned out it was pretty simple. I converted my numeric values to text, and included a copy of the transaction id and the line id as my test value, and did distinct on that... Sometimes it just helps not staring the problem down.