How to add new row as group total using Data Stage - datastage

I have input data like below
CovNo type Price
10 med tot 110
10 med tot 120
10 dent tot 140
20 med tot 110
20 dent tot 130
20 med tot 120
How can i generate the output data like below
CovNo type Price
10 med tot 110
10 med tot 120
10 dent tot 140
10 Group tot 370
20 med tot 110
20 dent tot 130
20 med tot 120
20 Group tot 360
Shall i know the logic to implement the above scenario in Datastage.
Thanks in advance,
Shanmugam

Depending on the source of your data one way could be to solve it with SQL if your source is a table.
If you want or need to do it in DataStage only you could fork your data using a copy stage and have one output fed into an aggregtor stage where you calculate the sums per group. The other output is of the copy is then combined with the results of the aggregator with a funnel stage (which behaves like a union in SQL). That's it.

For the above example, we could achieve it in transformer using the SaveInputRecord() and GetSavedInputRecord() and LastRowInGroup() functions.
IBM has provided a perfect example for this type of scenario.
Note: The performance would be impacted if the data is huge as it uses cache memory. So if your data is huge, then better to split the source and perform a join after the aggregation or you could easily do it using sql if you source is a database.

Related

Octave/Matlab: Arranging Space-Time Data in a Matrix

This is a question about coding common practice, not a specific error or other malfunctions.
I have a matrix of values of a variable that changes in space and time. What is the common practice, to use different columns for time or space values?
If there is a definite common practice, in the first place
Update: Here is an example of the data in tabular form. The time vector is much longer than the space vector.
t y(x1) y(x2)
1 100 50
2 100 50
3 100 50
4 99 49
5 99 49
6 99 49
7 98 49
8 98 48
9 98 48
10 97 48
It depends on your goal and ultimately doesn't matter that much. This is more the question of your convenience.
If you do care about the performance, there is a slight difference. Your code achieves maximum cache efficiency when it traverses monotonically increasing memory locations. In Matlab data stored column-wise, therefore processing data column-wise results in maximum cache efficiency. Thus, if you frequently access all the data at certain time layers, store space in columns. If you frequently access all the data at certain spatial points, store time in columns.

Count values by groups

I currently send events to graphite like
server.my_value with a count that varies between 1 and 1000
I'd like to plot a graph that will show me how many of these my_value I received by groups of 100 (so 1,4,7 and 89 will be counted toward the 1-100 group and etc)
so to illustrate, lets say I sent these values in a specific time
1
3
45
13
299
455
74
924
the groups will be
1-100: 5
200-300: 1
300-400: 0
400-500: 1
500-600: 0
600-700: 0
700-800: 0
800-900: 0
900-1000: 1
so I would have 10 lines, each represent a group
This could be in a histogram, but it won't show me the changes with time
Is it possible to aggregate values by ranges?
If you avoid using downsampled data by only querying for small time intervals and do not cross the aggregation boundary defined in your Graphite storage schema then you can just use the new histogram feature in the Graph panel in Grafana or the new Heatmap panel to get a histogram over time. (For example, if you have the storage schema 15s:7d,1m:21d,15m:5y and you stay within the 7 day boundary.) Here is an example from the Grafana demo site.
I don't think there is a good way to aggregate values by ranges in a Graphite query if you want to aggregate for larger time intervals. However, it seems to be possible to do it by aggregating the data in statsd:
http://dieter.plaetinck.be/post/histogram-statsd-graphing-over-time-with-graphite/

Monthly average from daily data

I have a dataset of energy efficient smart AC units. Each one has an ID, and each unit has daily data that represents the cost saved (in dollars) for each day.
I want to create a bar graph that shows the average savings, per month, per unit. I'm really struggling, however. AVG([Elecsavingscost]) only gets me the average daily savings in a given month. SUM([Elecsavingscost]) * 30 gets me pretty close to what I want, but of course, not all months have 30 days.
Is there a more intelligent way to do this? I'm presuming it's possible...
It can be very easily done using R software. Following is the code to convert daily data to monthly data
install.packages(c("zoo","hydroTSM")
library(zoo)
library(hydroTSM)
data=read.csv("data.csv")
data #data should contain 2 or more columns; 1st column should be date in
#English (U.K.) format, 2nd and subsequent columns should be your daily data
date Elecsavingscost
01-01-1984 18.8
02-01-1984 20.2
03-01-1984 19
04-01-1984 19.6
05-01-1984 21.8
06-01-1984 21.5
.
.
.
25-12-2014 13.6
26-12-2014 13.6
27-12-2014 16.2
28-12-2014 18.2
29-12-2014 16.7
30-12-2014 19.4
31-12-2014 18.5
x1 <- zooreg(data$Elecsavingscost, start = as.Date("1984-01-01"))
## Daily to monthly conversion
Elecsavingscost_monthly<- daily2monthly(x1, FUN=sum, na.rm=TRUE)
write.csv(Elecsavingscost_monthly,"Elecsavingscost_monthly.csv")

How to include Percentage in ToolTip in Tableau Public

I created the below view that gives No. of students got PASS/FAIL in each subject . The tool Tip gives me some default options.
But I would like to have Percentage in tool tip
Basically I need Percentage field in tooltip that says 50% for below screen.
PASS Percentage 50%
FAIL Percentage 50%
This Percentage field needs to vary as per each subject and its grade among students
Could somebody help me on steps to include Percentage in Tooltip?
sample dataset
id name age gender subject grade
100 Steve 14 MALE ENGLISH PASS
100 Steve 14 MALE PHYSICS PASS
100 Steve 14 MALE CHEMISTRY PASS
101 Edward 15 MALE ENGLISH FAIL
101 Edward 15 MALE PHYSICS FAIL
101 Edward 15 MALE CHEMISTRY FAIL
102 Andy 15 FEMALE ENGLISH PASS
102 Andy 15 FEMALE PHYSICS PASS
102 Andy 15 FEMALE CHEMISTRY FAIL
103 Kim 16 FEMALE ENGLISH FAIL
103 Kim 16 FEMALE PHYSICS FAIL
103 Kim 16 FEMALE CHEMISTRY PASS
Table calcs let you calculate percent of totals without creating new calculated fields.
Put SUM(Number of Records) on the Tooltip shelf. Then click on it and choose Quick Table Calc->Percent of Total. You will see a triangle icon next to the field indicating it is now a table calculation
Experiment with changing the Compute Using setting for the Field. I believe compute using Grade is probably the one you want.
I did this with a few calcs. First, get the PASS count.
if [Grade] = 'PASS' then 1 END
Then create a Pass % calc.
sum([Pass count]) / total(countd([Id]))
Now you can place this field in the Tooltip. Repeat for FAIL as well and place that in the Tooltip.
Then I updated the tooltip as follows:
Number of students <CNTD(Id)> (<AGG(pass %)> <AGG(fail %)>) who got a <Grade> in <Subject>
see sample workbook here for details. https://dl.dropboxusercontent.com/u/60455118/160326%20stack%20question.twbx

Data duplication in Detail band

I have a Group band with Detail band between the Group header and Group footer. The report is supposed to print multiple student examination records at once. I have grouped them according to the registration numbers. It works except for the fact that query results are shown twice.
as shown
Student 1
Subject score
Maths 50
English 60
kiswahili 70
Maths 50
English 60
kiswahili 70
Student 2
Subject score
Maths 90
English 60
kiswahili 60
Maths 60
English 60
kiswahili 60
insted of
Subject score
Maths 50
English 60
kiswahili 70
Student 2
Subject score
Maths 90
English 60
kiswahili 60
Add an ORDER BY to the end of your query:
ORDER BY registration_number, subject
Your data should be sent to the report template in the order in which it will be displayed. Order by student and then by subject.