IBM Datastage : Creating column that is calculation - datastage

I have a table which columns are location and credit, the location contains string rows which mainly is location_name and npl_of_location_name. the credit contains integer rows which mainly is credit_of_location_name and credit_npl_of_location_name. I need to make a column which calculates the ((odd rows of the credit - the even rows of the credit)*0.1). How do i do this?

When you specify "odd rows" and "even rows" are you referring to row numbers? Because, unless your query sorts the data, you have not control over row order; the database server returns rows however they are physically stored.
Once you are sure that your rows are properly sorted, then you can use a technique such as Mod(#INROWNUM,2) = 1 to determine "odd" and zero is even. This works best if the Transformer is executing in sequential mode; if it is executed in parallel mode then you need to use a partitioning algorithm that ensures that the odd and even rows for a particular location are in the same node.

Related

how to multiply variable to each element of a column in database

I am trying to add a column to a collection by multiplying the 0.9 to existing database column recycling. but I get a run time error.
I tried to multiply 0.9 direction in the function but it is showing error, so I created the class and multiplied it there yet no use. what could be the problem?
Your error message is telling you what the problem is: your database query is using GROUP BY in an invalid way.
It doesn't make sense to group by one column and then select other columns (you've selected all columns in your case); what values would they contain, since you haven't grouped by them as well (and get one row returned per group)? You either have to group by all the columns you're selecting for, and/or use aggregates such as SUM for the non-grouped columns.
Perhaps you meant to ORDER BY that column (orderBy(dt.recycling.asc()) if ascending order in QueryDSL format), or to select all rows with a particular value of that column (where(dt.recycling.eq(55)) for example)?

tableau show categories from calculation even when a category is not visible

I have a calculation and it outputs multiple values. Then I am creating a table on those values. For example, in below data my formula is
if data is 1 then calculation is `one`
if data is 2 then calculation is `two`
if data is 3 then calculation is `three`
as three doesn't really appear in the output, when I create a table, three is not displayed. Is there any way to display it?
I tried table layout >> show empty rows and columns and it didn't work
data calculation
1 one
2 two
Tableau discovers the possible values for a dimension field dynamically from the query results.
If ‘three’ does not appear in your data, then how do you expect Tableau to know to make a column header for that non existent, but potential, value? It can’t read your mind.
This situation does occur often though - perhaps you want row or column headers to remain stable, even when you change filters in a way that causes some to no longer appear in the query results.
There are a few ways you can force Tableau to pad ** or **complete a domain:
one solution is to pad your data to make sure each value for your dimension field appears in at least one data row.
You can often do this easily by using a union to append some extra rows to your original data. You can often add padding rows that don’t impact any results by leaving all your Measure columns null since nulls are ignored by aggregation functions
Another common solution that is a bit more effort is to make what is known as scaffolding data source that is not much more than a list of your dimension members. You can then use that data source as a primary data source with data blending, making your original data source secondary.
There are two situations where Tableau can detect the absence of data and leave space for it in the visualization automatically
for numeric types, you can create a bin field that will automatically pad for missing bins
similarly, date fields can show missing values because, like bins, Tableau can tell when a month doesn’t appear in the data and leave room for it in the view

executing query having over billion rows

I have a table say 'T' in kdb which has rows over 6 billion. When I tried to execute query like this
select from T where i < 10
it throws wsfull expection. Is there any way I can execute queries like this in table having large amount of data.
10#T
The expression as you wrote it first makes a bitmap containing all of the elements where i (rownumber) < 10, which is as tall as one of your columns. It then does where (which just contains til 10) and then gets them from each row. You can save the last step with:
T[til 10]
but 10#T is shorter.
Assuming you have a partitioned table here, it is normally beneficial to have the partitioning column (date, int etc.) as the first item in the where clause of your query - otherwise as mentioned previously you are reading a six billion item list into memory, which will result in a 'wsfull signal for any machine with less than the requisite amount of RAM.
Bear in mind that row index starts at 0 for each partition, and is not reflective of position in the overall table. The query that you gave as an example in your question would return the first ten rows of each partition of table T in your database.
In order to do this without reaching your memory limit, you can try running the following (if your database is date-partitioned):
raze{10#select from T where date=x}each date

running total using windows function in sql has same result for same data

From every references that I search how to do cumulative sum / running total. they said it's better using windows function, so I did
select grandtotal,sum(grandtotal)over(order by agentname) from call
but I realize that the results are okay as long as the value of each rows are different. Here is the result :
Is There anyway to fix this?
You might want to review the documentation on window specifications (which is here). The default is "range between" which defines the range by the values in the row. You want "rows between":
select grandtotal,
sum(grandtotal) over (order by agentname rows between unbounded preceding and current row)
from call;
Alternatively, you could include an id column in the sort to guarantee uniqueness and not have to deal with the issue of equal key values.

What is the difference between CHECKSUM_AGG() and CHECKSUM()?

What is the difference between CHECKSUM_AGG() and CHECKSUM() ?
CHECKSUM calculates a hash for one or more values in a single row and returns an integer.
CHECKSUM_AGG is an aggregate function that takes a single integer value from multiple rows and calculates an aggregated checksum for each group.
They can be used together to checksum multiple columns in a group:
SELECT category, CHECKSUM_AGG(CHECKSUM(*)) AS checksum_for_category
FROM yourtable
GROUP BY category
CHECKSUM_AGG will perform a checksum across all the values that are being aggregated, coming up with a value.
It's typically used to see if a collection of values (in the group) has generally changed.
CHECKSUM is intended to build a hash index based on an expression or column list.
One example of using a CHECKSUM is to store the unique value for the entire row in a column for later comparison.