KDB+: get count of all substrings in a table of strings - select

I'm new in KDB and struggling with creating a query. Will appreciate any help.
I have a table of strings and need to get the count of all specific substrings across all strings in the table.
So, let's assume that I have strings:
[
string1: Apple is green, cherry is red,
string2: Ququmber is green, banana is yellow
]
and I want to get a count of "Apple" and "green" across all substrings. My desired result is to have a grouping like so:
{
Apple: 1,
green: 2
}
But, unfortunately, I have no idea how to make such a grouping. I have already figured out how to get strings that contain at least one of the needed substrings:
"select count(text) from data where any text like/: (\"*$Apple*\";\"*$green*\")"
but that returns me the cumulative result of all found strings for Apple and green without any grouping:
{
text: 3
}
which does not allow differentiating the amount of each particular substring.
I will be really thankful for any help.

Instead of using a where clause with an any, you can put the like/: in the select phrase to get a nested list of booleans where each list represents the matches for one search string. Then you can just sum these to get the total matches for each search string. I've used an exec here rather than a select as I suspect that output will be more useful:
q)t:([] text:("Apple is green, cherry is red,";"Ququmber is green, banana is yellow"))
q)exec sum each text like/:("*Apple*";"*green*") from t
1 2i

You could use -4! to count the frequency of every substring
q)t:([] text:("Apple is green, cherry is red,";"Ququmber is green, banana is yellow"))
q)count each group exec raze -4!'text from t
"Apple" | 1
," " | 10
"is" | 4
"green" | 2
,"," | 3
"cherry" | 1
"red" | 1
"Ququmber"| 1
"banana" | 1
"yellow" | 1

Related

KDB: how to compare strings?

I have a column of type C. How do I compare the value to the previous value in the same column? I did col1 like prev col1 but it returns a Correction hint: length error. I also created another column newCol: prev col1 but still cannot perform the comparison. I also tried with = and no luck. How can I do this?
a sample data:
col1
Paris
London
London
New York
Singapore
Ha Noi
Could you use the prior keyword?
q)t
col1
-----------
"Paris"
"London"
"London"
"Ney York"
"Singapore"
"Ha Noi"
q)select (~) prior col1 from t
col1
----
0
0
1
0
0
0
When comparing strings, if they are the same length it will check that each character in each slot of the array is the same, and return a list of booleans to tell you where the strings are the same. If the strings are two different lengths, you will get a length error. If you want to test if two strings are the exact same thing, you can use ~, which will work regardless of the length of the string and give you a single boolean telling you if they are the same.
Use each prior:https://code.kx.com/q/ref/maps/#each-prior
With match: https://code.kx.com/q/basics/comparison/#match
q)tab:([]col1:("Paris";"London";"London";"New York"))
q)select col1,compare:(~':)col1 from tab
col1 compare
------------------
"Paris" 0
"London" 0
"London" 1
"New York" 0
You should use like' instead of like, because you are comparing not to single value, but to list.
update comparison: col1 like' prev col1 from
([]col1:("Paris";"London";"London";"New York";"Singapore";"Ha Noi"))
Although this is essentially the same as Matthews and jomahony's answers, the differ keyword can arguably make it easier to read/understand:
q)select not differ col1 from ([]col1:("Paris";"London";"London";"New York"))
col1
----
0
0
1
0

How to compare values of two columns in Postgresql

I need to compare two columns which has below values and get a consolidated value in PostgreSQL. For an Id value, if all is green OR red, I want GREEN or RED to be returned respectively and even if one is RED, I want RED to be returned. Can someone please help? Thanks
ID STATUS
1 GREEN
1 GREEN
1 RED
2 GREEN
2 GREEN
2 GREEN
You seem to want string aggregation:
select id, string_agg(distinct status, ' or ' order by status) statuses
from mytable
group by id
If you want a single value returned, with priority given to "RED", then:
select id, max(status)
from mytable
group by id
This works because, string-wise, "RED" is greater than "GREEN". You don't specify what to do if there are more than two columns, so the answer does not address that.

Select non-empty string rows in KDB

q)tab
items sales prices detail
-------------------------
nut 6 10 "blah"
bolt 8 20 ""
cam 0 15 "some text"
cog 3 20 ""
nut 6 10 ""
bolt 8 20 ""
I would like to select only the rows which have "detail" non-empty. Seems fairly straightforward, but not i can't get it to work.
q) select from tab where count[detail] > 0
This gives all the rows still.
Alternatively i tried
q) select from tab where not null detail
This gives me type error.
How can one query for non-empty string fields in KDB???
Rather than use adverbs, you can simplify this with the use of like.
q)select from tab where not detail like ""
items sales prices detail
------------------------------
nut 1 10 "blah"
cam 5 9 "some text"
As you need to perform the check row-wise, use each:
select from tab where 0 < count each detail
This yields to the following table:
items sales prices detail
------------------------------
nut 6 10 "blah"
cam 0 15 "some text"
Use the adverb each:
q)select from ([]detail:("blah";"";"some text")) where 0<count each detail
detail
-----------
"blah"
"some text"
I would use following approach
select from tab where not detail~\:""
where every detail is compared to empty string. Approach with not null detail does not work, because Q treats string as character array and checks if each of array elements is null. I.e. null "abc" returns boolean array 000b, but where clause expects for single boolean value for each "row"
If your table is not big, another way you can check is by converting it to a symbol in the where clause.
q)select from ([]detail:("blah";"";"some text")) where `<>`$detail
detail
-----------
"blah"
"some text"
Or simply
q)select from ([]detail:("blah";"";"some text")) where not null `$detail
detail
-----------
"blah"
"some text"

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi

Check if field value is in a list of strings in SSRS report

I'm using SSRS (VS2008) and creating a report of work orders. In the detail line of the report table, I have the following columns (with some fake data)
WONUM | A | B | Hours
ABC123 | 3 | 0 | 3
SPECIAL| 0 | 6 | 6
DEF456 | 5 | 0 | 5
GHI789 | 4 | 0 | 4
OTHER | 0 | 2 | 2
As you can kind of see, all work orders have a work order number (WONUM) as well as a total # of hours (HOURS). I need to put the hours into either column A or column B based on WONUM. I have a list of specifically named work orders (in the example, they would be "SPECIAL" and "OTHER") which would cause the HOURS value to be put in column B. If the WONUM is NOT a special named one, then it goes in column A. Here's what I WANTED to put as the expression for column A and column B:
Column A: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), 0, Fields!Hours.Value)
Column B: =IIF(Fields!WONUM.Value IN ("SPECIAL","OTHER"), Fields!Hours.Value, 0)
But as you're probably aware, Fields!WONUM.Value IN ("SPECIAL","OTHER") is not a valid method of doing this! What is the best way to make this work? I cannot flag it in the SQL query in any other way for other reasons so it must be done in the table.
Thanks in advance for any and all help!
Try this, (Using InStr() function)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, 0, Fields!Hours.Value)
IIF(InStr(Fields!WONUM.Value,"SPECIAL")>0 OR InStr(Fields!WONUM.Value,"OTHER")>0, Fields!Hours.Value,0)
If it's just the two WONUMs then you can do this:
Column A:
=IIF((Fields!WONUM.Value <> "SPECIAL") AND (Fields!WONUM.Value <> "OTHER"), Fields!Hours.Value, 0)
Column B:
=IIF((Fields!WONUM.Value = "SPECIAL") OR (Fields!WONUM.Value = "OTHER"), Fields!Hours.Value, 0)
or use the same formula in each column for consistency and swap the field/0 at the end.