KDB: how to compare strings? - kdb

I have a column of type C. How do I compare the value to the previous value in the same column? I did col1 like prev col1 but it returns a Correction hint: length error. I also created another column newCol: prev col1 but still cannot perform the comparison. I also tried with = and no luck. How can I do this?
a sample data:
col1
Paris
London
London
New York
Singapore
Ha Noi

Could you use the prior keyword?
q)t
col1
-----------
"Paris"
"London"
"London"
"Ney York"
"Singapore"
"Ha Noi"
q)select (~) prior col1 from t
col1
----
0
0
1
0
0
0
When comparing strings, if they are the same length it will check that each character in each slot of the array is the same, and return a list of booleans to tell you where the strings are the same. If the strings are two different lengths, you will get a length error. If you want to test if two strings are the exact same thing, you can use ~, which will work regardless of the length of the string and give you a single boolean telling you if they are the same.

Use each prior:https://code.kx.com/q/ref/maps/#each-prior
With match: https://code.kx.com/q/basics/comparison/#match
q)tab:([]col1:("Paris";"London";"London";"New York"))
q)select col1,compare:(~':)col1 from tab
col1 compare
------------------
"Paris" 0
"London" 0
"London" 1
"New York" 0

You should use like' instead of like, because you are comparing not to single value, but to list.
update comparison: col1 like' prev col1 from
([]col1:("Paris";"London";"London";"New York";"Singapore";"Ha Noi"))

Although this is essentially the same as Matthews and jomahony's answers, the differ keyword can arguably make it easier to read/understand:
q)select not differ col1 from ([]col1:("Paris";"London";"London";"New York"))
col1
----
0
0
1
0

Related

Type error of getting average by id in KDB

I am trying make a function for the aggregate consumption by mid in a kdb+ table (aggregate value by mid). Also this table is being imported from a csv file like this:
table: ("JJP";enlist",")0:`:data.csv
Where the meta data is for the table columns is:
mid is type long(j), value(j) is type long and ts is type timestamp (p).
Here is my function:
agg: {select avg value by mid from table}
but I get the
'type
[0] get select avg value by mid from table
But the type of value is type long (j). So I am not sure why I can't get the avg I also tried this with type int.
Value can't be used as a column name because it is keyword used in kdb+. Renaming the column should correct the issue.
value is a keyword and should not be used as a column name.
https://code.kx.com/q/ref/value/
You can remove it as a column name using .Q.id
https://code.kx.com/q/ref/dotq/#qid-sanitize
q)t:flip`value`price!(1 2;1 2)
q)t
value price
-----------
1 1
2 2
q)t:.Q.id t
q)t
value1 price
------------
1 1
2 2
Or xcol
https://code.kx.com/q/ref/cols/#xcol
q)(enlist[`value]!enlist[`val]) xcol t
val price
---------
1 1
2 2
You can rename the value column as you read it:
flip`mid`val`ts!("JJP";",")0:`:data.csv

Select non-empty string rows in KDB

q)tab
items sales prices detail
-------------------------
nut 6 10 "blah"
bolt 8 20 ""
cam 0 15 "some text"
cog 3 20 ""
nut 6 10 ""
bolt 8 20 ""
I would like to select only the rows which have "detail" non-empty. Seems fairly straightforward, but not i can't get it to work.
q) select from tab where count[detail] > 0
This gives all the rows still.
Alternatively i tried
q) select from tab where not null detail
This gives me type error.
How can one query for non-empty string fields in KDB???
Rather than use adverbs, you can simplify this with the use of like.
q)select from tab where not detail like ""
items sales prices detail
------------------------------
nut 1 10 "blah"
cam 5 9 "some text"
As you need to perform the check row-wise, use each:
select from tab where 0 < count each detail
This yields to the following table:
items sales prices detail
------------------------------
nut 6 10 "blah"
cam 0 15 "some text"
Use the adverb each:
q)select from ([]detail:("blah";"";"some text")) where 0<count each detail
detail
-----------
"blah"
"some text"
I would use following approach
select from tab where not detail~\:""
where every detail is compared to empty string. Approach with not null detail does not work, because Q treats string as character array and checks if each of array elements is null. I.e. null "abc" returns boolean array 000b, but where clause expects for single boolean value for each "row"
If your table is not big, another way you can check is by converting it to a symbol in the where clause.
q)select from ([]detail:("blah";"";"some text")) where `<>`$detail
detail
-----------
"blah"
"some text"
Or simply
q)select from ([]detail:("blah";"";"some text")) where not null `$detail
detail
-----------
"blah"
"some text"

kdb q - apply each-left for each atom in list and reduce

I would like to apply each-left between a column of a table and each atom in a list. I cannot use each-both because the table column and the list are not of same length.
I have seen this done in one line somewhere already but I can't find it anymore..
Example:
t:([] name:("jim";"john";"john";"julia");c1: til 4);
searchNames:("jim";"john");
f:{[name;nameCol] nameCol like\:name}; / each-left between name (e.g. "jim") and column
g:f[;t[`name]];
r:g each searchNames; / result: (1000b;0110b)
filter:|/[r]; / result: 1110b
select from t where filter
How can I do that more q-like?
If you wish to use like with each-right /::
q)select from t where any name like/:searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2
In this case you can simply use in as you are not using any wildcards:
q)select from t where name in searchNames
name c1
---------
"jim" 0
"john" 1
"john" 2
Below is a generic function you could use, given two lists of different sizes.
q)f:{(|) over x like/:y}
q)
q)select from t where f[name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
Or, wrapping it up in a single function (assuming always searching a table column):
q)f2:{x where (|) over (0!x)[y] like/:z}
q)
q)f2[t;`name;searchNames]
name c1
---------
"jim" 0
"john" 1
"john" 2
But in the scenario you describe, Thomas' solution seems the most natural.

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi

Adding leading zero if length is not equal to 10 digit using sql

I am trying to join 2 tables but my problem is that one of the table has 10 digit number and the other one may have 10 or less digit number. For this reason, i am loosing some data so i would like to do is check the length first if the length is less than 10 digit then i want to add leading zeros so i can make it 10 digit number. I want to do this when i am joining this so i am not sure if this is possible. Here is an example if i i have 251458 in the TABLE_WITHOUT_LEADING_ZERO then i want to change it like this: 0000251458. Here is what i have so far:
select ACCT_NUM, H.CODE
FROM TABLE_WITH_LEEDING_ZERO D, TABLE_WITHOUT_LEADING_ZERO H
WHERE substring(D.ACCT_NUM from position('.' in D.ACCT_NUM) + 2) = cast (H.CODE as varchar (10))
thanks
Another alternative:
SELECT TO_CHAR(12345,'fm0000000000');
to_char
------------
0000012345
In Netezza you can use LPAD:
select lpad(s.sample,10,0) as result
from (select 12345 as sample) s
result
-------
0000012345
However it would be more efficient to remove the zeros like in the example below:
select cast(trim(Leading '0' from s.sample) as integer) as result
from (select '0000012345' as sample) s
result
-------
12345