I have the following table:
Id Claim_id Date
4 111 10/08/2017
5 333 27/08/2017
2 111 07/08/2017
3 222 08/08/2017
1 444 03/07/2017
7 333 02/09/2017
6 333 28/08/2017
there are more rows (dates) associated to the same Claim_id; column "Id" is based on column "Date" (more recent dates have a greater Id).
I need to create a Calculated Column given by the date difference over claim_id, with the following output:
Id Claim_id Date Days
3 111 10/08/2017 3
1 333 27/08/2017
2 111 07/08/2017
4 222 08/08/2017
7 444 03/07/2017
6 333 02/09/2017 5
5 333 28/08/2017 1
I have tried to use the code given here: Spotfire date difference using over function but it doesn't work (it produces wrong values).
I think that, maybe, it's because my table is not sorted, but I can't order it because I have no access to the source database.
How can I modify that expression?
Thank you!
Valentina
#V.Ang- One way to do this is by adding a column 'decreasing_count'.
What this does is, it counts the number of instances of ID by date. Meaning - ID with highest date would be counted first and then followed by next instance of the same ID with date lower than the previous date and so on. Advantage of this column is, your data need not be sorted for this solution to work.
Now, using this 'decreasing_count' column calculate the difference of dates.
decreasing_count column expression:
Count([Claim_id]) over (Intersect([Claim_id],AllNext([Date])))
Note: This column works in the background. You need not display it in the table
Days calculated column expression:
Days([Date] - Min([Date]) over (Intersect([Claim_id],Next([decreasing_count]))))
Final Output:
Hope this helps!
Related
I know the wording of my title is vague so I will try to clarify my question.
tasks_table
user_id
completed_date
task_type
task_id
1
11/14/2021
A
34
1
11/13/2021
B
35
1
11/11/2021
A
36
1
11/09/2021
B
37
2
11/12/2021
A
38
2
11/02/2021
A
39
2
11/14/2021
B
40
2
10/14/2021
B
41
The table I am working with has more fields than this, but, these are the ones that are pertinent to the question. The task type can be either A or B.
What I am currently trying to do is get a result set that contains, at max, two tasks per user_id, one of task type A and one of task type B, that have been completed in the past 7 days. For example, the set the query should generate the following result set:
user_id
completed_date
task_type
task_id
1
11/14/2021
A
34
1
11/13/2021
B
35
2
11/12/2021
A
38
2
11/14/2021
B
40
There is a possibility that a user may have only done tasks of one type in that time period, but, it is guaranteed that any given user will have done atleast one task within that time. My question is it possible to create a query that can return such a result or would I have to query for a more generalized result and then trim the set down through logic in my JPA?
To select the most recent task for a given user_id and task_type within the last 7 days from now, if exists, you can try this :
SELECT DISTINCT ON (t.user_id, t.task_type) t.*
FROM tasks_table AS t
WHERE t.completed_date >= current_date - interval '7 days'
ORDER BY t.user_id, t.task_type, t.completed_date DESC
I have a table which looks like this
id login_id trend_type sep oct nov
1 abc#abc.com Billing 10 34 43
1 abc#abc.com Visits 20 43 56
1 abc#abc.com Revenue 30 12 12
1 pqr#pqr.com Billing 40 23 54
1 pqr#pqr.com Visits 50 21 47
1 pqr#pqr.com Revenue 60 98 12
I want to create a dashboard where I can display graphs of all these Trend Types and add a filter for the user so they can select the month for which they want to view the graphs.
I have tried this solution -
https://community.tableau.com/thread/228965
but I wasn't successful.
Tableau really likes data that is taller rather than wider. In this case, you need to do a PIVOT on the month data. A pivot will create a column for the months and another column for the values. Your data will have more rows now but fewer columns.
When you bring the data into Tableau, on the Data Source screen, highlight the three month columns and select pivot.
You can also change the name of the Pivot Field Names (to Month) and Pivot Field Values (to Amount or another appropriate name).
Click on the orange Sheet 1 on the bottom left. Next, create a calculated field to create a full date. (Tableau doesn't know what 'sep' is.)
[Pivot Field Names] + "-01-2019"
This field just creates a string that Tableau can parse (eg 'sep-01-2019'). Now tell Tableau it is a Date field by changing the field type (click on the Abc next to the Dimension name).
At this point, you can create a viz and add filters. Here is an example.
I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.
I have yearly data in my excel file in such format:
Country \ Years 1980 1981 ... 2010
Abkhazia 234 334 ... 456
Afghanistan 466 789 ... 732
...
Here is picture
And I want my data transform to 3 different tables and load it to postgres database.
Tables should look something like that
First table - country:
id | name
1 | Abkhazia
2 | Afghanistan
Second table dates:
id | date
1 | 1980
2 | 1981
And third is a table where all data is stored depending on country and date:
country_id date_id data
1 1 234
1 2 334
2 1 466
2 2 789
... ... ...
Any ideas how I could achieve my goal?
Assuming the source excel structure is as below (i have custom built this):
There are basically 3 parts to your question. I break down the transformation into part for better understanding:
1. Loading Table - Country
This is pretty straight forward based on the data given in the excel. Simply take an
Excel Input >> Add a sequence step. Give the Sequence name as Country ID >> Select only the Country Name and Country ID >> Load into the Country Table using Table Output.
2. Loading Table - Year:
The idea here is to display the Year ID in Row wise format instead of the columns given the excel source data. PDI version 5 and above provides you with a very useful step called Metadata Structure. This step allows you to get the structure of your table. In this case, we need to have the year columns pulled, ignoring the country column.
Follow the steps as below:
Read the Excel Data >> Get the Metadata structure of your source >> Filter Out the Country Column (which is available in row at position=1) >> Add a Sequence Number. Name it YearID >> Finally Load the Year Table.
3. Loading the Final Table - Country and Year along with Data:
The way to display all the column data values to a row level in PDI is using Row Normalizer step. Use this step to display a normalized output. Now follow the below steps:
Read the Excel source data >> use Row Normalizer Step to normalize the rows based on the Years >> Do a Stream Lookup with the Above Country and Year tables to fetch the CountryID and YearID respectively >> Finally Load the necessary column data into Table Output
Hope it helps :)
I have placed the codes in github repo along with the data file which i have used. Its here.
Also, just realized that i have given the wrong naming conventions as per your question. Consider date_id as YearID and instead of id's i have given countryid and yearid.
I have a sql statement I am using a simple sort such as the following
Select numbers
From theTable
Order By numbers
What I get in return is the following
1
11
12
14
2
21
22
23
3
35
37
etc...
I want it to be ordered in normal order
1
2
3
4
5
6
etc...
The column you are selecting isn't stored as a numeric value. You need to cast it to a some kind of number before orderby will behave the way you want.
It should be as easy as:
select numbers from order orderby cast(numbers as int)
As long as long as all the values in that column cast properly.
what's the datatype of the column storing the numbers? Convert/cast it to an int and you should get what you expect.