How to get all missing days between two dates - date

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!

Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...

You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation

You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Related

Find count of active users in the last 29 days in Tableau

Require assistance in calculating the Total Active Users from March 16 2020 to Feb 16 2020.
I have tried using calculated fields, but not getting the correct results. Please advise.
Thank you,
Nirmal
To find the number of unique values that appear in a field, say [user_code], you can use the COUNT DISTINCT function, COUNTD() as in COUNTD([user_code])
To restrict the data to a particular time range, one way is put your date field on the Filter shelf and choose the settings that include only the data rows you want — say the range from 2/16 to 3/16 as you stated.
Alternatively, you can push the filtering condition into the calculation with an IF function call, as in COUNTD(IF <data is relevant> THEN [user_code] END) Thus effectively combining the two techniques. That works because if there is no ELSE clause and the IF condition is False then the IF statement evaluates to null. Since COUNTD() silently ignores nulls, like other aggregation functions, the expression acts as if the irrelevant data rows were filtered.
So, for example,
COUNTD(IF [dates] >= #2/16/2020# AND [dates] <= #3/16/2020# THEN [user_code] END)
Will tell you then number of unique user codes during the period between 2/16 and 3/16. The DateDiff() function will probably be useful in more elaborate tests.
Finally, what if you want more flexibility? You could easily use Parameters or Filter controls to let the user choose the date range interactively.
If you want this calculation repeated for each possible day, showing the unique users in the preceding 30 day period, as some sort of rolling calculation, then you’ll need to learn about some more advanced features. Either multiple calculations as above for different time ranges, using Table Calculations, or some data prep and/or data padding with Tableau Prep Builder, Python or some other technique — mostly because in that scenario each data row contributes to multiple rolling counts, rather than one count when partitioning the data by some dimension.

KDB+/Q query too heavy to handle

I want to grab data from a KDB data base for a list of roughly 200 days within the last two years. The 200 days are in no particular pattern.
I only need the data from 09:29:00.000 to 09:31:00.000 everyday.
My first approach was to query all of the last two years data that have time stamp between 09:29:00.000 and 09:31:00.000, because I didn't see a way to just query the particular 200 days that I need.
However this proved to be too much for my server to handle.
Then I tried to summarize the 2 minute data for each date into an average and just print out the average, so now I will only have 200 rows of data as output. But somehow this still turns out to be too much. I'm not sure if this is because I'm not selecting the data correctly.
My other suspicion is that the query is garbing all the data first then averaging each date, which means averaging is not making it easier to handle.
Here's the code that I have:
select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty) by date from dms where date within(2015.01.01, 2016.06,10), time within(09:29:00.000, 09:31:00.000), sym = `ZF6
poms is the table that the data is in
ZFU6 is the symbol that im looking for
I tried adding the key word distinct after select.
I want to know if there's anyway to break up the query, or make the query lighter for the server to handle.
Thank you!
If you use 32-bit kdb+ and get infamous 'wsfull error then you may try processing one day at a time like this:
raze{select maxPriceB:max(price), minPriceB:min(price), avgPriceB:avg(price), avgSizeB:avg(qty)
from dms where date=x,sym=`ZF6,time within 09:29:00.000 09:31:00.000}each 2015.01.01+1+til 2016.06.10-2015.01.01

Volume of an Incident Queue at a Point in Time

I have an incident queue, consisting of a record number-string, the open time - datetime, and a close time-datetime. The records go back a year or so. What I am trying to get is a line graph displaying the queue volume as it was at 8PM each day. So if a ticket was opened before 8PM on that day or anytime on a previous day, but not closed as of 8, it should be contained in the population.
I tried the below, but this won't work because it doesn't really take into account multiple days.
If DATEPART('hour',[CloseTimeActual])>18 AND DATEPART('minute',[CloseTimeActual])>=0 AND DATEPART('hour',[OpenTimeActual])<=18 THEN 1
ELSE 0
END
Has anyone dealt with this problem before? I am using Tableau 8.2, cannot use 9 yet due to company license so please only propose 8.2 solutions. Thanks in advance.
For tracking history of state changes, the easiest approach is to reshape your data so each row represents a change in an incident state. So there would be a row representing the creation of each incident, and a row representing each other state change, say assignment, resolution, cancellation etc. You probably want columns to represent an incident number, date of the state change and type of state change.
Then you can write a calculated field that returns +1, -1 or 0 to to express how the state change effects the number of currently open incidents. Then you use a running total to see the total number open at a given time.
You may need to show missing date values or add padding if state changes are rare. For other analytical questions, structuring your data with one record per incident may be more convenient. To avoid duplication, you might want to use database views or custom SQL with UNION ALL clauses to allow both views of the same underlying database tables.
It's always a good idea to be able to fill in the blank for "Each record in my dataset represents exactly one _________"
Tableau 9 has some reshaping capability in the data connection pane, or you can preprocess the data or create a view in the database to reshape it. Alternatively, you can specify a Union in Tableau with some calculated fields (or similarly custom SQL with a UNION ALL clause). Here is a brief illustration:
select open_date as Date,
"OPEN" as Action,
1 as Queue_Change,
<other columns if desired>
from incidents
UNION ALL
select close_date as Date,
"CLOSE" as Action,
-1 as Queue_Change,
<other columns if desired>
from incidents
where close_date is not null
Now you can use a running sum for SUM(Queue_Change) to see the number of open incidents over time. If you have other columns like priority, department, type etc, you can filter and group as usual in Tableau. This data source can be in addition to your previous one. You don't have ta have a single view of the data for every worksheet in your workbook. Sometimes you want a few different connections to the same data at different levels of detail or for perspectives.

Tableau Future and Current References

Tough problem I am working on here.
I have a table of CustomerIDs and CallDates. I want to measure whether there is a 'repeat call' within a certain period of time (up to 30 days).
I plan on creating a parameter called RepeatTime which is a range from 0 - 30 days, so the user can slide a scale to see the number/percentage of total repeats.
In Excel, I have this working. I sort CustomerID in order and then sort CallDate from earliest to latest. I then have formulas like:
=IF(AND(CurrentCustomerID = FutureCustomerID, FutureCallDate - CurrentCallDate <= RepeatTime), 1,0)
CurrentCustomerID = the current row, and the FutureCustomerID = the following row (so it is saying if the customer ID is the same).
FutureCallDate = the following row and the CurrentCallDate = the current row. It is subtracting the future call time from the first call time to measure the time in between.
The goal is to be able to see, dynamically, how many customers called in for a specific reason within maybe 4 hours or 1 day or 5 days, etc. All of the way up until 30 days (this is our actual metric but it is good to see the calls which are repeats within a shorter time frame so we can investigate).
I had a similar problem, see here for detailed version Array calculation in Tableau, maxif routine
In your case, that is basically the same thing as mine, so you could apply that solution, but I find it easier to understand the one I'm about to give, I would do:
1) Create a calculated field called RepeatTime:
DATEDIFF('day',MAX(CallDates),LOOKUP(MAX(CallDates),-1))
This will calculated how many days have passed since the last call to the current. You can add a IFNULL not to get Null values for the first entry.
2) Drag CustomersID, CallDates and RepeatTime to the worksheet (can be on the marks tab, don't need to be on rows or column).
3) Configure the table calculation of RepeatTIme, Compute using Advanced..., partitioning CustomersID, Adressing CallDates
Also Sort by Field CallDates, Maximum, Ascending.
This will guarantee the table calculation works properly
4) Now you have a base that you can use for what you need. You can either export it to csv or mdb and connect to it.
The best approach, actually, is to have this RepeatTime field calculated outside Tableau, on your database, so it's already there when you connect to it. But this is a way to use Tableau to do the calculation for you.
Unfortunately there's no direct way to do this directly with your database.

iReport query results break-down by week or day

I have used iReport to create a simple JasperReport which I run on a JasperServer. It queries some fields from a number of MySQL tables based on their creation timestamp. I am providing the start and end timestamps of the period to cover in the report as parameters of type java.sql.Timestamp. This works fine.
I was asked to introduce the ability to show a break-down on weeks or days of the report data. I would like to get some ideas on where to start with this. At this point I don't think I can accommodate this 'break-down' in the report query, since this feature seems beyond what SQL is designed to do. I know this sounds like an OLAP drill through, but I would like to avoid OLAP if possible (steep learning curve, tight deadlines).
My first thought was to create a subreport for each week or day . But this would leave me with an arbitrary number of subreports (depending on the overall time period covered by the report, which varies at each execution), and as far as I can tell iReport does not support this.
Here is one way to break the report down.
Create another parameter, groupby, which holds the a value that designates the grouping requirement. The values can be numeric, string or whatever else as long as it corresponds to day, week, month, etc.. grouping available.
Create a report group, breakdown, which will provide the breakdown. The group expression will depend on the groupby parameter. The expression is the date on the record except any detail finer than groupby value will be trimmed.
Create a variable, total, that will sum the data in the records. The variable should be reset on breakdown group and can be printed in the breakdown trailer band.
Make sure the sql queries sortby the date so that the groupby expression works.
Let me know if you have questions.