How to solve the below scenario using transformer loop or anything in datastage - datastage

My data is like below in one column coming from a file.
Source_data---(This is column name)
CUSTOMER 15
METER 8
METERStatement 1
READING 1
METER 56
Meterstatement 14
Reading 5
Reading 6
Reading 7
CUSTOMER 38
METER 24
METERStatement 1
READING 51
CUSTOMER 77
METER 38
READING 9
I want the output data to be like below in one column
CUSTOMER 15 METER 8 METERStatement 1 READING 1
CUSTOMER 15 METER 56 Meterstatement 14 Reading 5
CUSTOMER 15 METER 56 Meterstatement 14 Reading 6
CUSTOMER 15 METER 56 Meterstatement 14 Reading 7
CUSTOMER 38 METER 24 Meterstatement 1 Reading 51
CUSTOMER 77 METER 38 'pad 100 spaces' Reading 9
I am trying to solve by reading transformer looping documentation but could not figure out an actual solution. anything helps. thank you all.

Yes this could be solved within a transformer stage.
Concatenation is done with ":".
So use a stage variable to concat the input until a new "Meter" or "Customer" row comes up.
Save the "Customer" in a second stage variable in case it does not change.
Use a condition to only output thew rows where a "Reading" exists.
Reset the concatenated string when a "Reading" has been processed.
I guess you want the padding for missing fields in general - you could do these checks in separate stage variables. You have to store the previous item inorder to kow wat is missing - and maybe even more if two consecutive items could be missing.

Related

Sorting KDB table while excluding Total row

I have noticed when using xasc capital letters take precedence over lower case.
Trying to exclude Total from being considered when doing the sort, wanting to avoid using "lower" then recapitalizing it again. I have my solution below but its rather poor code
t:flip (`active`price`price2)!(`def`abc`xyz`hij`Total;12j, 44j, 468j, 26j, 550j;49j, 83j, 716j, 25j, 873j)
Thinking there's a better way than this
(`active xasc select from t where not active=`Total),select from t where active=`Total
Although it does not match the sort order of your example answer, if you're looking to sort by true lexicographical order excluding captials you could do the following:
q)t:([]active:`def`abc`xyz`hij`Total;price:12 44 468 26 550;price2:49 83 716 25 873)
q)t iasc lower t`active
active price price2
-------------------
abc 44 83
def 12 49
hij 26 25
Total 550 873
xyz 468 716
Otherwise, if you're looking to have the Total row at the bottom following the sort then you will need to append it after doing so - given your example table:
q)(select[<active]from t where active<>`Total),select from t where active=`Total
active price price2
-------------------
abc 44 83
def 12 49
hij 26 25
xyz 468 716
Total 550 873
There isn't really a much cleaner way to do it, but this approach ensures Total is at the bottom without needing two selects (but it needs a group and a sort)
q)raze`active xasc/:t group`Total=t`active
active price price2
-------------------
abc 44 83
def 12 49
hij 550 873
xyz 26 25
Total 468 716
Matthew's is probably the best all-round solution.
If you know Total is always going to end up first after the sort then:
{1_x,1#x}`active xasc t // sort, join the first row to the end, drop first row
is a pretty concise solution - this is obviously not ideal if you don't have control over the active column contents as other uppercase entries would make this unpredictable.

Spotfire data difference: same column

I have the following table:
Id Claim_id Date
4 111 10/08/2017
5 333 27/08/2017
2 111 07/08/2017
3 222 08/08/2017
1 444 03/07/2017
7 333 02/09/2017
6 333 28/08/2017
there are more rows (dates) associated to the same Claim_id; column "Id" is based on column "Date" (more recent dates have a greater Id).
I need to create a Calculated Column given by the date difference over claim_id, with the following output:
Id Claim_id Date Days
3 111 10/08/2017 3
1 333 27/08/2017
2 111 07/08/2017
4 222 08/08/2017
7 444 03/07/2017
6 333 02/09/2017 5
5 333 28/08/2017 1
I have tried to use the code given here: Spotfire date difference using over function but it doesn't work (it produces wrong values).
I think that, maybe, it's because my table is not sorted, but I can't order it because I have no access to the source database.
How can I modify that expression?
Thank you!
Valentina
#V.Ang- One way to do this is by adding a column 'decreasing_count'.
What this does is, it counts the number of instances of ID by date. Meaning - ID with highest date would be counted first and then followed by next instance of the same ID with date lower than the previous date and so on. Advantage of this column is, your data need not be sorted for this solution to work.
Now, using this 'decreasing_count' column calculate the difference of dates.
decreasing_count column expression:
Count([Claim_id]) over (Intersect([Claim_id],AllNext([Date])))
Note: This column works in the background. You need not display it in the table
Days calculated column expression:
Days([Date] - Min([Date]) over (Intersect([Claim_id],Next([decreasing_count]))))
Final Output:
Hope this helps!

bulk import 80 lines of data via API

I have a tool that every x hours creates a y set of lines that I would simply like to add to a column into a specific smartsheet. And then every x hours I would simply like to overwrite these values with the new ones. That can have a different numbers of lines.
As I read the API in order to add or update anything I need to get all the row and columne IDs of the smart sheet in question.
Isn't there a easy way to formulate a JSON with a set of data and columne name and it just auto adds the rows as needed?
Data example is:
21
23
43
23
12
23
43
23
12
34
54
23
and then it could be:
23
23
55
4
322
12
3
455
3
AUTO
I really find it hard to believe that I need to read so much information into a script to be able to add just row of data. Nothing fancy.
Looking into sticking to just using cURL or Python
Thanks
If you want to add this data as new rows, this is fairly simple. It's only if you would like to replace existing data in existing rows where you would need to specify the row id.
The python SDK allows you to specify just a single column id, like so:
row_a = smartsheet.models.Row()
row_a.cells.append({
'column_id': 642523719853956
'value': 'New Status',
'strict': False
})
For more details, please see the API documentation regarding adding rows.

reshape and merge in stata

I have three data sets:
First, called education.dta. It contains individuals(students) over many years with their achieved educations from yr 1990-2000. Originally it is in wide format, but I can easily reshape it to long. It is presented as wide under:
id educ_90 educ_91 ... educ_00 cohort
1 0 1 1 87
2 1 1 2 75
3 0 0 2 90
Second, called graduate.dta. It contains information of when individuals(students) have finished high school. However, this data set do not contain several years only a "snapshot" of the individ when they finish high school and characteristics of the individual students such as backgroung (for ex parents occupation).
id schoolid county cohort ...
1 11 123 87
2 11 123 75
3 22 243 90
The third data set is called teachers.dta. It contains informations about all teachers at high school such as their education, if they work full or part time, gender... This data set is long.
id schoolid county year education
22 11 123 2011 1
21 11 123 2001 1
23 22 243 2015 3
Now I want to merge these three data sets.
First, I want to merge education.dta and graduate.dta on id.
Problem when education.dta is wide: I manage to merge education and graduation.dta. Then I make a loop so that all the variables in graduation.dta takes the same over all years, for eksample:
forv j=1990/2000 {
gen county j´=.
replace countyj´=county
}
However, afterwards when reshaping to long stata reposts that variable id does not uniquely identify the observations.
further, I have tried to first reshape education.dta to long, and thereafter merge either 1:m or m:1 with education as master, using graduation.dta.
However stata again reposts that id is not unique. How do I deal with this?
In next step I want to merge the above with teachers.dta on schoolid.
I want my final dataset in long format.
Thanks for your help :)
I am not certain that I have exactly the format of your data, it would be helpful if you gave us a toy dataset to look at using dataex (and could even help you figure out the problem yourself!)
But to start, because you are seeing that id is not unique, you need to figure out why there might be multiple ids in any of the datasets. Can someone in graduate.dta or education.dta appear more than once? help duplicates will probably be useful to explore the data in this way.
Because you want your dataset in long format I suggest reshaping education.dta to long first, then doing something like merge m:1 id using "graduate.dta" (once you figure out why some observations are showing up more than once) and then, finally something like merge 1:1 schoolid year using "teacher.dta" and you will have your final dataset.

filemaker relationships not displaying

I import races into Excel but it has grown to a large spreadsheet and has its limitations.
I have successfully imported a small test database of results.
So far I have the form in the database with the tables and relationships below but when I try to make a layout to view the form I get the horse name and 1 line of form, when I scroll down the same horse displays again with it's next run.
I think it's because I have failed to fill the foreign keys in the horse_Race table, or got the relationships wrong.
I also want to add a Today's runners table but am not sure how to relate it to the existing tables Is it possible to achieve these aims in filemaker or am I barking up the wrong tree. I am at an Impasse but I'm sure it's to do with the relationships somewhere?
Tables as follows:
> -Course:-
pk_Course_ID, Course,
Horse:-
pk_Horse_ID,Horse
Races:-
pk_Race_ID,Course,Rdate,Rtime,Going,Age,Furs,Class,Ran,
- ***Horse_Race;-
pk_Run_ID,fk_Course_ID,fk_Horse_ID,fk_Races_ID,Course,RDate,Rtime,Going,Age,Furs,Class,Ran,Pos,Drw,TBtn,Horse,Wgt,MARK,GRD,WA,AA,BHB,BHBAdj,RATING,PPL
Relationships from primary key in each table to foreign keys in Horse_race table.
My aims are as follows.
To view EACH individual horse and its FORM in date order latest run at the top
AJCook (IRE)
DATE CRSE Going Furs Class Ran Pos Drw TBtn Wgt
MARK GRD RATING
31-Jul-13 REDC GD 6 6 11 11 1 20.8 133 65 63 -1
08-Jul-13 RIPO GF 6 6 11 7 3 8.25 133 65 65 41
21-Jun-13 REDC GF 5 5 5 1 4 0.02 133 60 56 54
28-May-13 REDC GF 6 5 13 5 6 5.35 124 61 70 35
06-May-13 BEVE GF 5 5 12 8 13 6.15 125 65 73 40
To add a todays runners table with races and runners from each of the days races that would loop through each horse and search the database to display the horses and their last 3 ratings latest on the right plus the TOP 9 RATINGS FROM THE LAST 3 RATINGS IN ORDER like so:-
HORSE R1 R2 R3 HORSE RATE
A J Cook (IRE) 54 41 -1 Abadejo 57
Aaranyow (IRE) 45 36 48 Abadejo 56
Aarti (IRE) 44 43 40 A J Cook (IRE) 54
Aazif (IRE) 46 43 23 Abadejo 54
Abadejo 56 54 57 Aaranyow (IRE) 48
How do I add the todays runners table which has the following data
Date Time,Course,Furs,HorseNo Horse
How will it be related to the tables I already have? Many thanks
Davey H
FileMaker can easily do this. But I am a bit confused with your blocks of text up there, can you format them a little better so that I can see what tables hold which fields please?