Making a Histogram in Tableau - tableau-api

I work for a software company, and I am working with a database that tracks certain events that occur in one of our games. Every time one of the tracked events occurs, a text entry in the “Event Type” field specifies what kind of event it is – “User Login,” “Enemy Killed,” “Player Death,” etc. Another field, “Session ID,” assigns a unique ID number to each individual game session. So if a user logs in to the game, kills eight enemies, and then logs out again, each of those Enemy Killed events will have the same Session ID.
I’m trying to make a histogram showing the number of sessions that have x number of Enemy Killed events. How do I go about this? I’m a raw beginner at Tableau, so if you can dumb down your answer to the explain-like-I’m-five level that would be great.

Tableau 9.0 has been launched, and your problem can be solved entirely inside Tableau.
What you need is to understand the Level of Detail calculations. It will look like this:
{ FIXED [Session ID] : COUNT( IF [Event Type] = 'Enemy Killed'
THEN 1
END )
}
This will calculated how many kills each session had. You can create BINS with this field, and count how many sessions there are (COUNTD([Session ID]))

Well, my answer will echo with many of my last answers. Your database is not ready to do that analysis.
Basically what your database should look like is:
SessionId EnemiesKilled
1234 13
So you could create a histogram on EnemiesKilled.
To do the histograms, you can create BINs (right click on field, Create Bins), but I find it very limited, as it only creates BINS of the same width. What I usually do is a bunch of IF and ELSEIF to manually create the BINs, to better suit my purposes.
To convert your db to the format I explained, it's better if you can manipulate it outside Tableau and connect to it directly. If it's SQL, a GROUPBY Session ID, and COUNT of EnemyKilled Events should work (not exactly like this but that's the idea).
To do it on Tableau, you can drag SessionId (to either Marks or Rows, for this purpose of creating a table I usually put everything on Marks and choose Bar chart, so Tableau won't waste time plotting anything) and a calculated field like:
SUM(
IF EventType = "Enemy Killed"
THEN 1
ELSE 0
END
)
Then export the data to a csv or mdb and then connect to it

Related

Tableau filter based on multiple parameters?

I have some data like this below
data image see link
I would like to make a dashboard that will show you all the related empires based on what you choose (those that existed at the same time AND those in one of it's regions of influence). For example if I choose Rome then it will only show Egypt, Greek and Gaul and not show Byzantine because it is from a later time and not show China because it is in a different region. See below
See expected result picture in link
The simple way to achieve this task is to "Self-Join"
I would self-join the data again with Inner join on Region and Era
then, to handle the duplicate rows I would create a calculation
[Empire_Data1] = [Empires_Data2]
and put as false in the filter shelf.
then if you drag both Empires field you will the output you are looking for,
Since this is like 20 rows of data, you can perform a self join without any challenge.
But you have a lot of rows as in hundreds of thousands or more then, you might want to prep your data before connecting to the tableau.

How to get the sum directly with one number?

I'm a beginner for tableau. I want to get the direct numbers for each row, but i get the number which are separate, how can i achieve this?
I've tried the sentence like:count("Implemented"), but I don't get the result I want.
For example, for the 1st row I want 3 10 10
not 111 10 112111111
Here is worksheet.
My code:
EDIT :
here is the photo for implementation opportunities
As you can see, the status is related to the date, I think maybe it causes the records which are counted 1by1.
Now the situation is that: i create the code which is related to the date, if i remove this from mark, it will cause the problem (the code is related to the date), but if i leave it, the system will always count it one by one. My code is not perfect but i can't find another one which can replace it.....
EDIT 2:
in short,what i want is the sum of the remaining opportunity:10
capture
Remove DAY from Mark shelf. That detail is producing those separations.
Attaching a workbook with numbers similar to (but not exact due to proprietary issues) is almost always advised. You will get the right answer a lot sooner than just screenshots.
In any case, it seems as if the measure portion of the visualization is properly being summed by the date. Try selecting the measure, and manually selecting "sum" from the menu drop down. Here is a link for more detail.
Secondly, you can play around with table calculations. Click this link and read up on option 3.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Volume of an Incident Queue at a Point in Time

I have an incident queue, consisting of a record number-string, the open time - datetime, and a close time-datetime. The records go back a year or so. What I am trying to get is a line graph displaying the queue volume as it was at 8PM each day. So if a ticket was opened before 8PM on that day or anytime on a previous day, but not closed as of 8, it should be contained in the population.
I tried the below, but this won't work because it doesn't really take into account multiple days.
If DATEPART('hour',[CloseTimeActual])>18 AND DATEPART('minute',[CloseTimeActual])>=0 AND DATEPART('hour',[OpenTimeActual])<=18 THEN 1
ELSE 0
END
Has anyone dealt with this problem before? I am using Tableau 8.2, cannot use 9 yet due to company license so please only propose 8.2 solutions. Thanks in advance.
For tracking history of state changes, the easiest approach is to reshape your data so each row represents a change in an incident state. So there would be a row representing the creation of each incident, and a row representing each other state change, say assignment, resolution, cancellation etc. You probably want columns to represent an incident number, date of the state change and type of state change.
Then you can write a calculated field that returns +1, -1 or 0 to to express how the state change effects the number of currently open incidents. Then you use a running total to see the total number open at a given time.
You may need to show missing date values or add padding if state changes are rare. For other analytical questions, structuring your data with one record per incident may be more convenient. To avoid duplication, you might want to use database views or custom SQL with UNION ALL clauses to allow both views of the same underlying database tables.
It's always a good idea to be able to fill in the blank for "Each record in my dataset represents exactly one _________"
Tableau 9 has some reshaping capability in the data connection pane, or you can preprocess the data or create a view in the database to reshape it. Alternatively, you can specify a Union in Tableau with some calculated fields (or similarly custom SQL with a UNION ALL clause). Here is a brief illustration:
select open_date as Date,
"OPEN" as Action,
1 as Queue_Change,
<other columns if desired>
from incidents
UNION ALL
select close_date as Date,
"CLOSE" as Action,
-1 as Queue_Change,
<other columns if desired>
from incidents
where close_date is not null
Now you can use a running sum for SUM(Queue_Change) to see the number of open incidents over time. If you have other columns like priority, department, type etc, you can filter and group as usual in Tableau. This data source can be in addition to your previous one. You don't have ta have a single view of the data for every worksheet in your workbook. Sometimes you want a few different connections to the same data at different levels of detail or for perspectives.

Tableau Future and Current References

Tough problem I am working on here.
I have a table of CustomerIDs and CallDates. I want to measure whether there is a 'repeat call' within a certain period of time (up to 30 days).
I plan on creating a parameter called RepeatTime which is a range from 0 - 30 days, so the user can slide a scale to see the number/percentage of total repeats.
In Excel, I have this working. I sort CustomerID in order and then sort CallDate from earliest to latest. I then have formulas like:
=IF(AND(CurrentCustomerID = FutureCustomerID, FutureCallDate - CurrentCallDate <= RepeatTime), 1,0)
CurrentCustomerID = the current row, and the FutureCustomerID = the following row (so it is saying if the customer ID is the same).
FutureCallDate = the following row and the CurrentCallDate = the current row. It is subtracting the future call time from the first call time to measure the time in between.
The goal is to be able to see, dynamically, how many customers called in for a specific reason within maybe 4 hours or 1 day or 5 days, etc. All of the way up until 30 days (this is our actual metric but it is good to see the calls which are repeats within a shorter time frame so we can investigate).
I had a similar problem, see here for detailed version Array calculation in Tableau, maxif routine
In your case, that is basically the same thing as mine, so you could apply that solution, but I find it easier to understand the one I'm about to give, I would do:
1) Create a calculated field called RepeatTime:
DATEDIFF('day',MAX(CallDates),LOOKUP(MAX(CallDates),-1))
This will calculated how many days have passed since the last call to the current. You can add a IFNULL not to get Null values for the first entry.
2) Drag CustomersID, CallDates and RepeatTime to the worksheet (can be on the marks tab, don't need to be on rows or column).
3) Configure the table calculation of RepeatTIme, Compute using Advanced..., partitioning CustomersID, Adressing CallDates
Also Sort by Field CallDates, Maximum, Ascending.
This will guarantee the table calculation works properly
4) Now you have a base that you can use for what you need. You can either export it to csv or mdb and connect to it.
The best approach, actually, is to have this RepeatTime field calculated outside Tableau, on your database, so it's already there when you connect to it. But this is a way to use Tableau to do the calculation for you.
Unfortunately there's no direct way to do this directly with your database.