Zscore with Rolling Window Panel Data - merge

I am trying to calculate the zscore with Rolling window. I need to actually calculate standard deviation for a 3 year rolling window to calculate z-score. A minimal working example is given below:
use http://dss.princeton.edu/training/Panel101.dta
xtset country year
rolling sd_x1=r(sd), step(1) window(3) saving(sd_x1, replace) keep(year): sum x1, detail
Now after this I need to merge it back with the original file. But the variable year does not appear but a column name date appears with all missing values. I am trying to merge it using the following command:
merge 1:1 country year using sd_x1
However, I get the error that variable year is not found and actually this variable is not kept while running the rolling command. Any help will be much appreciated.

I am always surprised that people have interest or faith in standard deviations based on three values.
A more direct approach would be to use rangestat (SSC). The syntax could be something like
use http://dss.princeton.edu/training/Panel101.dta
xtset country year
rangestat (sd) sd=x1, interval(year 0 2) by(country)
except that I cannot test this at the moment.
The key difference here is that rangestat produces new variables in the current dataset. Search the Statalist archives for examples of rangestat use.
Note that in your example the detail option is unnecessary as summarize by itself produces standard deviations.
You can extend this approach to get the mean at the same time.

Related

Anylogic: Time depending stock in-flow with table functions

I would like to simulate a stock (array with six different dimensions) and its changes over the years in Anylogic. The initial values of the stock are given as well as the changes over the years (Excel table format). I think the values of the stock changes can't be describes by a constant mathematical function. So, they need to be implemented in the model as a table.
My approach was to create the stock and also the in-flow as an array with the six dimensions. I entered the initial values in the stock (inital values of stock). After that, I created six table functions with the number of the year as the argument and the change value of the stock in this year in the value column (example of a table funcion). To get the particular in-flow for the current simulation year, I entered the table function in the in-flow and set a variable as an argument (inflow of the stock). For the variable I entered a function, which I found on the Anylogic help website. It is the following: int getYear(Date date) (current year variable).
So as my understanding is right, this variable should give out the current year of the simulation as an integer. Using this value as a argument for the table function, I was thinking the model should work, but it seams that there is a problem the variable for the current year. I hope this was understandable. If not, please don't hesitate to ask.
Do you have any idea what could be the mistake or what I can do better to create a working model?
Thank you very much for your help.
Have a nice day and stay save
David
I've been looking for an answer to the same situation as yours, but wasn't able to find the clear guide for it.
In my case, I added an 'event' variable from the agent-based simulation palette with Trigger type == 'Timeout', Mode == 'Cyclic', and both First occurrence and recurrence time as 1 (in 'years' in your case).
Then, try to set actions for the event variable as what you intended:
Veraenderung_Lkw_Flotte [Nicht_Autom] = Veraenderung Nicht Autom pro Jahr(time());
Veraenderung_Lkw_Flotte [AF_Stufe_1] = Veraenderung Stufe 1 pro Jahr(time());
Veraenderung_Lkw_Flotte [AF_Stufe_2] = Veraenderung Stufe 2 pro Jahr(time());
Here, the time() returns the current model time. You may need to use different specification depending on your time specification.
Hope you find it helpful.

Have Max value of range of dates filter be todays date

I have a "Range of Dates" filter and what I want is for the max (or right most value) to always be the most recent date which should be today's date. What seems to be happening is that if I leave the dashboard open and come back the next day the max value is yesterday's date and I must manually move the slider over to be today's date. How can I accomplish this?
I find a calculated field is the best way to do this as I have run into the same issues using the out of the box max date filter.
Create a calculated field as follows:
[date] = {FIXED: max([date])}
This creates a True False field where only the records that have the max data and carried through.
Now drag this onto the filter pane and select 'TRUE'.
I've generally seen two basic approaches for this problem: Calculated fields and relative dates.
Use a calculated field or parameter or some combination of calculated fields and parameters with filters. This is similar to what smb suggests in their answer to this question. It also seems to be the most popular approach.
If you don't particularly care about being able to set the end-date with the slider, you could try using relative dates, using the approaches detailed in the accepted answer to this Tableau forum question and in this Tableau Knowledge Base article. Jennifer Vonhagel also gives a second answer to the Tableau forum question farther down that uses a parameter plus calculated field approach.
Additionally, this Tableau Knowledge Base article offers another option (Option 1, in the article) if you have Tableau 10.3+: You can use the "Latest Date Preset" (see here for details) check box in the date filter dialog box. I haven't used this, but it looks promising if you're using Tableau Desktop (seems like it wouldn't work for Tableau Web). The article's Options 2-4 are just riffs on calculated fields, in my opinion.
Two more approaches I've heard of – but never personally seen in the wild:
Push the max date down into the view you put Tableau on top of and let the view do the work.
Use a script to modify the Tableau workbook's XML.

How to get the sum directly with one number?

I'm a beginner for tableau. I want to get the direct numbers for each row, but i get the number which are separate, how can i achieve this?
I've tried the sentence like:count("Implemented"), but I don't get the result I want.
For example, for the 1st row I want 3 10 10
not 111 10 112111111
Here is worksheet.
My code:
EDIT :
here is the photo for implementation opportunities
As you can see, the status is related to the date, I think maybe it causes the records which are counted 1by1.
Now the situation is that: i create the code which is related to the date, if i remove this from mark, it will cause the problem (the code is related to the date), but if i leave it, the system will always count it one by one. My code is not perfect but i can't find another one which can replace it.....
EDIT 2:
in short,what i want is the sum of the remaining opportunity:10
capture
Remove DAY from Mark shelf. That detail is producing those separations.
Attaching a workbook with numbers similar to (but not exact due to proprietary issues) is almost always advised. You will get the right answer a lot sooner than just screenshots.
In any case, it seems as if the measure portion of the visualization is properly being summed by the date. Try selecting the measure, and manually selecting "sum" from the menu drop down. Here is a link for more detail.
Secondly, you can play around with table calculations. Click this link and read up on option 3.

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Tableau Future and Current References

Tough problem I am working on here.
I have a table of CustomerIDs and CallDates. I want to measure whether there is a 'repeat call' within a certain period of time (up to 30 days).
I plan on creating a parameter called RepeatTime which is a range from 0 - 30 days, so the user can slide a scale to see the number/percentage of total repeats.
In Excel, I have this working. I sort CustomerID in order and then sort CallDate from earliest to latest. I then have formulas like:
=IF(AND(CurrentCustomerID = FutureCustomerID, FutureCallDate - CurrentCallDate <= RepeatTime), 1,0)
CurrentCustomerID = the current row, and the FutureCustomerID = the following row (so it is saying if the customer ID is the same).
FutureCallDate = the following row and the CurrentCallDate = the current row. It is subtracting the future call time from the first call time to measure the time in between.
The goal is to be able to see, dynamically, how many customers called in for a specific reason within maybe 4 hours or 1 day or 5 days, etc. All of the way up until 30 days (this is our actual metric but it is good to see the calls which are repeats within a shorter time frame so we can investigate).
I had a similar problem, see here for detailed version Array calculation in Tableau, maxif routine
In your case, that is basically the same thing as mine, so you could apply that solution, but I find it easier to understand the one I'm about to give, I would do:
1) Create a calculated field called RepeatTime:
DATEDIFF('day',MAX(CallDates),LOOKUP(MAX(CallDates),-1))
This will calculated how many days have passed since the last call to the current. You can add a IFNULL not to get Null values for the first entry.
2) Drag CustomersID, CallDates and RepeatTime to the worksheet (can be on the marks tab, don't need to be on rows or column).
3) Configure the table calculation of RepeatTIme, Compute using Advanced..., partitioning CustomersID, Adressing CallDates
Also Sort by Field CallDates, Maximum, Ascending.
This will guarantee the table calculation works properly
4) Now you have a base that you can use for what you need. You can either export it to csv or mdb and connect to it.
The best approach, actually, is to have this RepeatTime field calculated outside Tableau, on your database, so it's already there when you connect to it. But this is a way to use Tableau to do the calculation for you.
Unfortunately there's no direct way to do this directly with your database.