Kdb find first not null value - group-by

While doing a Group By In KDB. I have to find the first not null value in that group for a column
For e.g.
t:([]a:1 1 1 2;b:0n 1 3 4 )
select first b by a from t
I found one way to achieve this is:
select first b except 0n by a from t
I am not sure if it is a correct way to do this. Please provide suggestions.

It seems a good way to do it to me.
Two alternatives would include:
select first b where not null b by a from t
Benefit being it doesn't rely on a certain column type, maybe more clearly explains your intent but it is slightly longer. Or
select b:last fills reverse b by a from t
Which on some test runs was the quickest way.
In kdb there's always multiple ways to do things and never really a right or wrong answer.

Related

How to get all missing days between two dates

I will try to explain the problem on an abstract level first:
I have X amount of data as input, which is always going to have a field DATE. Before, the dates that came as input (after some process) where put in a table as output. Now, I am asked to put both the input dates and any date between the minimun date received and one year from that moment. If there was originally no input for some day between this two dates, all fields must come with 0, or equivalent.
Example. I have two inputs. One with '18/03/2017' and other with '18/03/2018'. I now need to create output data for all the missing dates between '18/03/2017' and '18/04/2017'. So, output '19/03/2017' with every field to 0, and the same for the 20th and 21st and so on.
I know to do this programmatically, but on powercenter I do not. I've been told to do the following (which I have done, but I would like to know of a better method):
Get the minimun date, day0. Then, with an aggregator, create 365 fields, each has that "day0"+1, day0+2, and so on, to create an artificial year.
After that we do several transformations like sorting the dates, union between them, to get the data ready for a joiner. The idea of the joiner is to do an Full Outer Join between the original data, and the data that is going to have all fields to 0 and that we got from the previous aggregator.
Then a router picks with one of its groups the data that had actual dates (and fields without nulls) and other group where all fields are null, and then said fields are given a 0 to finally be written to a table.
I am wondering how can this be achieved by, for starters, removing the need to add 365 days to a date. If I were to do this same process for 10 years intead of one, the task gets ridicolous really quick.
I was wondering about an XOR type of operation, or some other function that would cut the number of steps that need to be done for what I (maybe wrongly) feel is a simple task. Currently I now need 5 steps just to know which dates are missing between two dates, a minimun and one year from that point.
I have tried to be as clear as posible but if I failed at any point please let me know!
Im not sure what the aggregator is supposed to do?
The same with the 'full outer' join? A normal join on a constant port is fine :) c
Can you calculate the needed number of 'dublicates' before the 'joiner'? In that case a lookup configured to return 'all rows' and a less-than-or-equal predicate can help make the mapping much more readable.
In any case You will need a helper table (or file) with a sequence of numbers between 1 and the number of potential dublicates (or more)
I use our time-dimension in the warehouse, which have one row per day from 1753-01-01 and 200000 next days, and a primary integer column with values from 1 and up ...
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
You've identified you know how to do this programmatically and to be fair this problem is more suited to that sort of solution... but that doesn't exclude powercenter by any means, just feed the 2 dates into a java transformation, apply some code to produce all dates between them and for a record to be output for each. Java transformation is ideal for record generation
Ok... so you could override your source qualifier to achieve this in the selection query itself (am giving Oracle based example as its what I'm used to and I'm assuming your data in is from a table). I looked up the connect syntax here
SQL to generate a list of numbers from 1 to 100
SELECT (MIN(tablea.DATEFIELD) + levquery.n - 1) AS Port1 FROM tablea, (SELECT LEVEL n FROM DUAL CONNECT BY LEVEL <= 365) as levquery
(Check if the query works for you - haven't access to pc to test it at the minute)

Concatenate Multiple Returned Rows Into One Row (standard methods don't work for some reason)

I have a relatively noobish question. I say this because I feel I am just missing the obvious here. I am simply doing what many have done and asked about previously, but typical methods I have used before are not working. Hopefully it's just me missing something simple.
Below is part of a bigger query I am working on, but I am simply trying to combine two rows with only one column of data different, into one row with that similar column separated by a delimiter. Easy enough with a CONCAT or STRING_AGG right?....well doesn't work for me and I don't know why.
SELECT array_to_string(array_agg(ls_number), ',') "ls_number",
--Also tried CONCAT(ls_number, ',') and string_agg(ls_number, ',')
--and they don't work
shipitem_shiphead_id,
shipitem_orderitem_id,
shiphead_number
FROM shipitem
LEFT JOIN invhist
ON (shipitem_invhist_id=invhist_id)
LEFT JOIN invdetail
ON (invhist_id=invdetail_invhist_id)
LEFT JOIN ls
ON (invdetail_ls_id=ls_id)
LEFT JOIN shiphead
ON (shiphead_id = shipitem_shiphead_id)
WHERE shiphead_number = '72211'
GROUP BY ls_number,
shiphead_number,
shipitem_shiphead_id,
shipitem_orderitem_id
The results when the above query is ran:
And you can see from the above results window that the Lot Numbers are split into 2 rows. I need them to be on one row, with the Lot Numbers separated by the delimiter ','. Can someone explain what I am missing here? Thanks a bunch in advance!
You have ls_number in your group by clause, meaning you'll get a different row for every distinct value of it in your result. Remove it from the group by clause and you should be OK.

SUM the NUMC field in SELECT

I need to group a table by the sum of a NUMC-column, which unfortunately seems not to be possible with ABAP / OpenSQL.
My code looks like that:
SELECT z~anln1
FROM zzanla AS z
INTO TABLE gt_
GROUP BY z~anln1 z~anln2
HAVING SUM( z~percent ) <> 100 " percent unfortunately is a NUMC -> summing up not possible
What would be the best / easiest practices here as I cannot alter the table itself?
Unfortunately the NUMC type is described as numerical text, so at the end it lands in the database as VARCHAR and that is why the functions like SUM or AVG cannot be used.
It all depends on how big your table is. If it is rather small you could get the group fields and the values for sum into an internal table and then sum it using COLLECT statement and eventually remove the rows for which the sum is equal 100%.
One solution is to define the field in the table using a more appropriate type.
NUMC is often used for key fields - like document numbers, which there would never be a reason to add together.
I didn't find a smooth solution.
What I did, was to copy everything in an internal table, looped over it converting the NUMC values to DEC values. Grouping and summing up worked at that point.
At the end, I converted the DEC values back to NUMC values.
It's been awhile. I came back to this post, because someone voted up my original answer. I was thinking about editing my old answer but I decided to post a new one. As this question was asked in 2017, there were some restictions but now it can be done by using CAST function in new OpenSQL.
SELECT z~anln1
FROM zzanla AS z
INTO TABLE #gt_
GROUP BY z~anln1, z~anln2
HAVING SUM( CAST( z~percent AS INT4 ) ) <> 100

Postgres - Order by 2 fields contemporary

I've got a problem with an order by.
I have this table
|desc|phone|calls|group|priority|
|cccc|12347|700|13247|0|
|aaaa|12345|900|12345|0|
|bbbb|12346|500|12345|1|
I need to order this table for calls respecting the group, so my result should be
|desc|phone|calls|group|priority|
|aaaa|12345|900|12345|0|
|bbbb|12346|500|12345|1|
|cccc|12347|700|13247|0|
because bbbb is with the same group of aaaa.
How can I do this?
Thanks
EDIT:
Hi all,
sorry if my question is unclear, next time I'll be more specific.
Yes, I need to order this table for calls respecting the group, like if bbbb doesn't exists
Since your question is a bit unclear, An order by is enough
select * from sample order by "group"
sqlfiddle-demo

n-th row in PostgreSQL for p-quantile

I'm trying to fetch the n-th row of a query result. Further posts suggested the use of OFFSET or LIMIT but those forbid the use of variables (ERROR: argument of OFFSET must not contain variables). Further I read about the usage of cursors but I'm not quite sure how to use them even after reading their PostgreSQL manpage. Any other suggestions or examples for how to use cursors?
My main goal is to calculate the p-quantile of a row and since PostgreSQL doesn't provide this function by default I have to write it on my own.
Cheers
The following returns the 5th row of a result set:
select *
from (
select <column_list>,
row_number() over (order by some_sort_column) as rn
) t
where rn = 5;
You have to include an order by because otherwise the concept of "5th row" doesn't make sense.
You mention "use of variable" so I'm not sure what you are actually trying to achive. But you should be able to supply the value 5 as a variable for this query (or even a sub-select).
You might also want to dig further into windowing functions. Because with that you could e.g. do a sum() over the 3 rows before the current row (or similar constructs) - which could also be useful for you.
if you would like to get 10th record, below query also work fine.
select * from table_name order by sort_column limit 1 offset 9
OFFSET simply skip that many rows before beginning to return rows as mentioned in LIMIT clause.