Replace infinity with nulls throughout entire table KDB - kdb

Example table:
table:([]col1:20 40 30 0w;col2:4?4;col3: 100 200 0w 300)
My solution:
{.[table;(where 0w=table[x];x);:;0n]}'[exec c from meta table where t="f"]
There is a way I am not seeing I'm sure. This just returns a list of for each change which I don't want. I just want the original table returned with nulls replaced.
Thanks in advance!

It would be good to flesh out your question a bit more. Are you always expecting it to be float columns? Will the table have many columns? Will there be string/sym columns mixed in that might complicate things?
If your table has a small number of columns you could just do an update
q)show t
col1 col2 col3
--------------
20 1 100
40 2 200
30 2 0w
0w 1 300
q)inftonull:{(x where x=0w):0n;x}
q)update inftonull col1, inftonull col3 from t
col1 col2 col3
--------------
20 2 100
40 1 200
30 0
3 300
If you think the column names might change or have a very large number of columns you could try a functional update (where you can pass the column names as parameters)
q){![t;();0b;x!inftonull,/:x,:()]}`col1`col3
col1 col2 col3
--------------
20 1 100
40 2 200
30 2
1 300
If your table is comprised of only numeric data something like
q)flip{(x where x=.Q.t[type x]$0w):x 0N;x}each flip t
col1 col2 col3
--------------
20 2 100
40 1 200
30 0
3 300
Might work, which tries to account for the fact the numeric data has different types.
If your data is going to contain string/sym columns the last example won't work

Related

PostgreSQL: Removing duplicate column

I am working on the output of a postgres subquery and have the table with 20 columns(generated using WITH clause).
The table looks something like this
col1 col2 col3 --- col20
4 4 24
6 6 45
5 5 66
5 5 12
I want to write a write a query that remove the duplicated column. I tried by select all the columns except the 2nd. But I could not find a better way to do that.
The expected output is:
col1 col3 ------ col20
4 24
6 45
5 66
5 12
Thanks

Randomly select observations separately for each column in SQL

I am interested in generating a completely (damaged) randomized data where observations are selected randomly (with replacement) for each field and then combined. I will need to generate a new dummy id to represent the old id as I don't want to reconstruct the data. My goal is to create a simulated column-wise random dataset.
Here is a sample data:
Id Col1 Col2 Col3
11 A 0.01 David
12 B 0.04 Max
13 C 0.05 Tom
14 E 0.06 West
15 C 0.02 Mike
What I am interested in is something like this:
Id2 Col1 Col2 Col3
1 E 0.04 Mike
2 C 0.06 David
3 B 0.02 West
4 A 0.04 Tom
5 C 0.05 Max
I am looking for an organized way of doing this. Here is what I attempted so far but am not interested in doing many times over since I have a lot of columns in the real data.
proc sql;
create table newtable1 as
select monotonic() as id2, col1 from
(select col1 from Table1 order by ranuni(0));
quit;
Using the above code you generate separate random columns and then combine them using the new monotonic key.

TSQL, Pivot rows into single columns

Before, I had to solve something similar:
Here was my pivot and flatten for another solution:
I want to do the same thing on the example below but it is slightly different because there are no ranks.
In my previous example, the table looked like this:
LocationID Code Rank
1 123 1
1 124 2
1 138 3
2 999 1
2 888 2
2 938 3
And I was able to use this function to properly get my rows in a single column.
-- Check if tables exist, delete if they do so that you can start fresh.
IF OBJECT_ID('tempdb.dbo.#tbl_Location_Taxonomy_Pivot_Table', 'U') IS NOT NULL
DROP TABLE #tbl_Location_Taxonomy_Pivot_Table;
IF OBJECT_ID('tbl_Location_Taxonomy_NPPES_Flattened', 'U') IS NOT NULL
DROP TABLE tbl_Location_Taxonomy_NPPES_Flattened;
-- Pivot the original table so that you have
SELECT *
INTO #tbl_Location_Taxonomy_Pivot_Table
FROM [MOAD].[dbo].[tbl_Location_Taxonomy_NPPES] tax
PIVOT (MAX(tax.tbl_lkp_Taxonomy_Seq)
FOR tax.Taxonomy_Rank in ([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],[11],[12],[13],[14],[15])) AS pvt
-- ORDER BY Location_ID
-- Flatten the tables.
SELECT Location_ID
,max(piv.[1]) as Tax_Seq_1
,max(piv.[2]) as Tax_Seq_2
,max(piv.[3]) as Tax_Seq_3
,max(piv.[4]) as Tax_Seq_4
,max(piv.[5]) as Tax_Seq_5
,max(piv.[6]) as Tax_Seq_6
,max(piv.[7]) as Tax_Seq_7
,max(piv.[8]) as Tax_Seq_8
,max(piv.[9]) as Tax_Seq_9
,max(piv.[10]) as Tax_Seq_10
,max(piv.[11]) as Tax_Seq_11
,max(piv.[12]) as Tax_Seq_12
,max(piv.[13]) as Tax_Seq_13
,max(piv.[14]) as Tax_Seq_14
,max(piv.[15]) as Tax_Seq_15
-- JOIN HERE
INTO tbl_Location_Taxonomy_NPPES_Flattened
FROM #tbl_Location_Taxonomy_Pivot_Table piv
GROUP BY Location_ID
So, then here is the data I would like to work with in this example.
LocationID Foreign Key
2 2
2 670
2 2902
2 5389
3 3
3 722
3 2905
3 5561
So I have some data that is formatted like this:
I have used pivot on data like this before--But the difference was it had a rank also. Is there a way to get my foreign keys to show up in this format using a pivot?
locationID FK1 FK2 FK3 FK4
2 2 670 2902 5389
3 3 722 2905 5561
Another way I'm looking to solve this is like this:
Another way I could look at doing this is I have the values in:
this form as well:
LocationID Address_Seq
2 670, 5389, 2902, 2,
3 722, 5561, 2905, 3
etc
is there anyway I can get this to be the same?
ID Col1 Col2 Col3 Col4
2 670 5389, 2902, 2
This, adding a rank column and reversing the orders, should gives you what you require:
SELECT locationid, [4] col1, [3] col2, [2] col3, [1] col4
FROM
(
SELECT locationid, foreignkey,rank from #Pivot_Table ----- temp table with a rank column
) x
PIVOT (MAX(x.foreignkey)
FOR x.rank in ([4],[3],[2],[1]) ) pvt

kdb+: group by and sum over multiple columns

Consider the following data:
table:
time colA colB colC
-----------------------------------
11:30:04.194 31 250 a
11:30:04.441 31 280 a
11:30:14.761 31.6 100 a
11:30:21.324 34 100 a
11:30:38.991 32 100 b
11:31:20.968 32 100 b
11:31:56.922 32.2 1000 b
11:31:57.035 32.6 5000 c
11:32:05.810 33 100 c
11:32:05.810 33 100 a
11:32:14.461 32 300 b
Now how can I sum colB whenever colC is the same, without losing the time order.
So the output would be:
first time avgA sumB colC
-----------------------------------
11:30:04.194 31.2 730 a
11:30:38.991 32.07 1200 b
11:31:57.035 32.8 5100 c
11:32:05.810 33 100 a
11:32:14.461 32 300 b
What I have so far:
select by time from (select first time, avg colA, sum colB by colC, time from table)
But the output is not grouped by colC. How should the query look like?
How about this?
get select first time, avg colA, sum colB, first colC by sums colC<>prev colC from table
A slightly different way to achieve this using differ :
value select first time, avg colA, sum colB , first colC by g:(sums differ colC) from table

How to get sum of each column from the same select query

I have a table as follows
Col1 Col2 Col3 Col4
------------------------------
100 400 400 300
200 600 400 700
800 600 500 900
300 100 700 500
--------------------------------
Total 1700 2000 2400
IAs you can see, I want total of each column (excluding 1st column).
I am not sure whether we can fetch the total of each column with same select query which I am using to fetch this data.
If not please suggest any alternative.
Use UNION ALL operator to get this done like,
SELECT col1, col2, col3, col4
FROM <table>
UNION ALL
SELECT NULL, sum(col2), sum(col3), sum(col4)
FROM <table>;
Why cant you achieve this by simple select statement,
select sum(col2),sum(col3),sum(col4) from table;
Hope this works !!