Function updating multiple columns in kdb - kdb

I have a table i want to update:
q)show table:([]letter:`a`b`c`a;fruit:`apple`banana`pear`strawberry;family:`mom`dad`brother`sister)
letter fruit family
-------------------------
a apple mom
b banana dad
c pear brother
a strawberry sister
I want to replace all entries with the name of their respective column.
This seems to work:
q){![table;();0b;(enlist x)!(enlist `x)]}`letter
letter fruit family
-------------------------
letter apple mom
letter banana dad
letter pear brother
letter strawberry sister
...but not this:
q){![table;();0b;(enlist x)!(enlist `x)]}`letter`fruit
'type
[1] {![table;();0b;(enlist x)!(enlist `x)]}
^
q))
The purpose is to create a function that creates dummy variables for categorical variables, so I need a general function. Any suggestions?

This can be achieved with # apply instead of a functional update like so:
q){#[`table;x;:;x]}`letter`fruit
`table
q)table
letter fruit family
--------------------
letter fruit mom
letter fruit dad
letter fruit brother
letter fruit sister
Edit - For all cols:
{#[`table;x;:;x]} each cols table
or
{#[x;y;:;y]}/[table;cols table]
q)table
letter fruit family
-------------------
letter fruit family
letter fruit family
letter fruit family
letter fruit family

Your original functional update could work like this:
q){![table;();0b;x!enlist'[x:(),x]]}cols table
letter fruit family
-------------------
letter fruit family
letter fruit family
letter fruit family
letter fruit family

Related

KDB+/Q: How to categorize data

My column has categorical data. E.g. cat and dog are animals, ant,bee and wasp are insects.
t:([] creature:`cat`dog`ant`bee`wasp`crocodile; cnt:til 5)
And I need to add a column 'category' to show creature type.
I know how to do dict mapping, but this looks ugly:
update category:creature ^ ((`cat`dog`ant`bee`wasp)!(`animal`animal`insect`insect`insect)) creature from t
I could use it if I knew how to create such mapping dictionary from simple lists like:
mapping: (???) ((`cat`dog;`animal);(`ant`bee`wasp;`insect)
In the format provided you can use flip each to create all the pairs for the dictionary:
q)flip each ((`cat`dog;`animal);(`ant`bee`wasp;`insect))
(`cat`animal;`dog`animal)
(`ant`insect;`bee`insect;`wasp`insect)
Which can then be turned into a dictionary by razing that into a single list first:
(!). flip raze flip each ((`cat`dog;`animal);(`ant`bee`wasp;`insect))
cat | animal
dog | animal
ant | insect
bee | insect
wasp| insect
Or you could make a table and run whatever kind of join suits the case.
q)ungroup flip`creature`category!flip L
creature category
-----------------
cat animal
dog animal
ant insect
bee insect
wasp insect

String match in Postgresql

I am trying to make separate columns in my query result for values stored in in a single column. It is a string field that contains a variety of similar values stored like this:
["john"] or ["john", "jane"] or ["john", "john smith', "jane"],etc... where each of the values in quotes is a distinct value. I cannot seem to isolate just ["john"] in a way that will return john and not john smith. The john smith value would be in a separate column. Essentially a column for each value in quotes. Note, I would like the results to not contain the quotes or the brackets.
I started with:
Select name
From namestbl
Where name like %["john"]%;
I think this is heading in the wrong direction. I think this should be in select instead of where.
Sorry about the format, I give up trying to figure out the completely useless error message when i try to save this with table markdown.
Your data examples represent valid JSON array syntax. So cast them to JSONB array and access individual elements by position (JSON arrays are zero based). The t CTE is a mimic of real data. In the illustration below the number of columns is limited to 6.
with t(s) as
(
values
('["john", "john smith", "jane"]'),
('["john", "jane"]'),
('["john"]')
)
select s::jsonb->>0 name1, s::jsonb->>1 name2, s::jsonb->>2 name3,
s::jsonb->>3 name4, s::jsonb->>4 name5, s::jsonb->>5 name6
from t;
Here is the result.
name1
name2
name3
name4
name5
name6
john
john smith
jane
john
jane
john

Pattern matching Postgres to replace misspelled words of a column

I have a table A :
name
renamed_name
HON/A
HONDA TRUCK
GMC
and I have a renaming rules table B:
rule
correct_name
HON/A
HONDA
HONDA TRUCK
HONDA
^GMC.+
GMC
I need to update table A and set the column A.renamed_name to the B.correct_name of Table B where A.name matched any of B.rule.
When I use the following update query:
Update A set A.renamed_name = B.correct_name from B where name ~* any(array[B.rule]) gives me a result
name
renamed_name
HON/A
HONDA
HONDA TRUCK
HONDA
GMC
NULL
The last row is not updated though my condition check includes regex exp. Please let me know where I can be possibly going wrong or if there is an alternate solution.

using patindex to replace characters

I have a table with a name column in it that contains names like:
A & A Turf
C & D Railways
D & B Railways
I have the following query that will get me the correct columns I want
select name from table where patindex('_ & _ %', name) > 0
What I need to accomplish is making anything with that type of pattern collapsed. Like this
A&A Turf
C&D Railways
D&B Railways
I'm also looking how I can do the same thing with single letter followed by a space followed by another single letter followed by a space then words with more then one letter like this
A F Consulting -> AF Consulting
D B Catering -> DB Consulting
but only if the single letter stuff is at the beginning of the value.
Example would be if the name has the pattern mentioned above anywhere in the name then don't do anything unless it's at the beginning
ALBERS, J K -> ALBERS, J K This would not change because it's a name and it's not at the beginning.
So something like this would be the desired result:
Original Name New Name Rule
____________ __________ ___________
A & K Consulting A&K Consulting Space Taken out between & for single characters
C B Finance CB Finance space taken out only if beginning beginning
Albert J K Albert J K not at beginning so left alone
This can be done without PATINDEX. Because what needs to be replaced is at the start, and has fixed patterns. So you already know the positions.
Example snippet:
DECLARE #Table TABLE (ID INT IDENTITY(1,1) PRIMARY KEY, name VARCHAR(30));
INSERT INTO #Table (name) VALUES
('A & K Consulting'),
('C B Finance'),
('Albert J K'),
('Foo B & A & R');
SELECT
name AS OldName,
(CASE
WHEN name LIKE '[A-Z] [A-Z] %' THEN STUFF(name,2,1,'')
WHEN name LIKE '[A-Z] & [A-Z] %' THEN STUFF(name,2,3,'&')
ELSE name
END) AS NewName
FROM #Table;
Test on rextester here
The first one is straightforward: replace " & " with "&". The second I'll have to take more time.

SAS: merging on first match only

I'm looking to do a one-to-many merge in SAS, where I would like to only keep the first match.
Example data below:
data one;
input id $ fruit $;
datalines;
a apple
b apple
c banana
d coconut
;
data two;
input id $ color $;
datalines;
a amber
b brown
c cream
c cocoa
c carmel
;
data both;
merge one two;
by id;
run;
proc print data=both;
run;
As you can see, this is a one-to-many merge.
Is there a way to make it keep only the first match? i.e. the output would be as below:
a apple amber
b apple brown
c banana cream
d coconut .
The background here is that the first dataset contains properties, and the second contains leases, and I am looking to find only the first lease on a property. I've only just started learning SAS, so it might be that there is a function better suited to this?
Many thanks!
Mike
Check this out:-
/*Creating Datasets*/
data one;
input id $ fruit $;
datalines;
a apple
b apple
c banana
d coconut
;
data two;
input id $ color $;
datalines;
a amber
b brown
c cream
c cocoa
c carmel
;
/*Just insert first.Id=1 in your code, it should do the job*/
data both;
merge one two;
by id;
if first.id =1;
run;
proc print data=both;
run;
Hope this helps:-)