insert string with special characters into KDB+ - kdb

What type should I use to create a table in KDB+ and insert a string with special charcters: spaces, #, -, etc. - it looks like KDB+ treat all these and similar characters specially, because when I create a table like this:
t: ([] str: ())
And insert the string "abc # efgf - ABC.FS #.... TEST TEST" - long string with different characters, including spaces, - and # like this:
`t insert "abc # efgf - ABC.FS #.... TEST TEST"
KDB returns type exception.

Your problem here doesn't come from the special characters, it comes from the fact that a string is a list of characters. You need to use enlist to insert the string as a single element into the table.
In fact, this case is a bit atypical because you only have one column in the table, so you actually need to use enlist twice, as kdb expects a list of column data as the second argument in insert. So for this table use
`t insert enlist enlist "blah blah # # #"
If you had a table with more than one column, then you only need one enlist for the string, e.g.
t:([]id:(); str:())
`t insert (1; enlist "blah blah # # #")

I'm not sure why you get 'type error though as mine got 'length error...
Anyway, insert expect right hand side to be a list containing items that match the number of columns. i.e.
q)t:([]a:();b:())
q)`t insert (1;2) / single record matching 2 columns
,0
To insert multiple records, right hand side will be a nested list, each item is a list.
q)`t insert (2 3;4 5)
1 2
So in your case, to insert a single record of string, you need a singleton list that contains an enlisted string:
q)t: ([] str: ())
q)`t insert enlist enlist "abc # efgf - ABC.FS #.... TEST TEST"
,0
q)t
str
-------------------------------------
"abc # efgf - ABC.FS #.... TEST TEST"

Related

kdb+: Save table into a csv file

I have the below table "dates" , it has a sym column with symbols and a d column with list of strings and would like to save it into a regular CSV file. Couldn't find a good way to do it. Any suggestions?
q)dates
sym d
----------------------------------------------------------------------------
6AH0 "1970.03.16" "1980.03.17" "1990.03.19" "2010.03.15"
6AH6 "1976.03.15" "1986.03.17" "1996.03.18" "2016.03.14"
6AH7 "1977.03.14" "1987.03.16" "1997.03.17" "2017.03.13"
6AH8 "1978.03.13" "1988.03.14" "1998.03.16" "2018.03.19"
6AH9 "1979.03.19" "1989.03.13" "1999.03.15" "2019.03.18"
When I try to do the regular save the below error happens:
q)save `:dates.csv
k){$[t&77>t:#y;$y;x;-14!'y;y]}
'type
q))
The internal table->csv conversion function within Kdb+ is not able to handle nested lists in columns. The d column in your table is a list of list of chars. However, the conversion function is able to handle a simply nested column (depth of 1).
Therefore, you can convert the d column to a list of chars and then save to CSV using the internal function:
/ generate a table of dummy data
q)show dates:flip `sym`d!(`6AH0`6AH6`6AH7;string (3;0N)#12?.z.d)
sym d
--------------------------------------------------------
6AH0 "2008.02.04" "2015.01.02" "2003.07.05" "2005.02.25"
6AH6 "2012.10.25" "2008.08.28" "2017.01.25" "2007.12.27"
6AH7 "2004.02.01" "2005.06.06" "2013.02.11" "2010.12.20"
/ convert 'd' column to simple list - the (" " sv') is the conversion func here
q)#[`dates;`d;" " sv']
`dates
/ review what was done
q)show dates
sym d
--------------------------------------------------
6AH0 "2008.02.04 2015.01.02 2003.07.05 2005.02.25"
6AH6 "2012.10.25 2008.08.28 2017.01.25 2007.12.27"
6AH7 "2004.02.01 2005.06.06 2013.02.11 2010.12.20"
/ save to csv
q)save `:dates.csv
`:dates.csv
/ review saved csv
q)\cat dates.csv
"sym,d"
"6AH0,2008.02.04 2015.01.02 2003.07.05 2005.02.25"
"6AH6,2012.10.25 2008.08.28 2017.01.25 2007.12.27"
"6AH7,2004.02.01 2005.06.06 2013.02.11 2010.12.20"
As per the csv specification, you'll want to flatten the list out and separate each with a comma and double quote the list.
'save' is limited in that the file must be named the same as the global variable you are saving.
If I was tasked with your question I'd do it like so;
`:myFileNamedWhatever.csv 0: csv 0: select sym,csv sv'd from dates
Explanation;
csv 0: table /csv is a variable, literally defined as "," - its good for readability. csv 0: table converts the table to a comma separated list of strings
`:file 0: listOfStrings /this takes a LIST of strings and pushes them to the file handle. Each element of the list is a new line in the file
I'd prefer this approach as it is general and allows the saving of various types. You can use it within a function etc..
At a later date I decided that I wanted it saved as a pipe (or anything) separated file;
`:myNewFile.psv 0: "|" 0: select sym,"|"sv'd from table

In KDB, how to bulk insert into a table containing a column of type String

In a table with a column of type string (not symbol) I am able to insert a row one by one. But bulk insert is not working!
t:([id:`int$()] str:()) /create table
`t insert(0, enlist enlist "test") /insert first row. This seems to need two enlist
`t insert(1, enlist "test1") /insert one more, this time with one enlist
`t insert (2 3; enlist "test2" enlist "test3") /trying to bulk insert fails
`t insert flip (2 3; enlist "test2" enlist "test3") /trying to bulk insert with flip also fails
I would prefer semicolon instead of comma in insert command. That makes the syntax more simple which also makes it easy to understand. Because of comma syntax, 2 enlist are required in your first insert.
Syntax you are using for bulk insert is not correct. Its very simple as below:
q) t:([id:`int$()] str:()) / create table
q) `t insert (0;enlist "test") / insert first row
q) `t insert (1 2;("test1";"test2")) / bulk insert 2 rows

pgSQL: select first occurrence of the number inside a string

String contains numeric and alphabetic data. what is the way to pick up only number? for example:
for the string "abc-123a-66" select should return "123"
You could use regexp_matches
CREATE table foo (
test VARCHAR);
INSERT INTO foo VALUES('abc-123a-66');
SELECT (regexp_matches(test, '\d+'))[1] FROM foo;
Example at SQLFiddle
In PostgreSQL this can be done with:
SELECT regexp_matches(regexp_replace(whatever_columnn,'\D*',''),'\d+') FROM whatever_table;
The first function (regexp_replace) deletes every non digit from the beginning of the string, the second (regexp_matches) extracts one or more occurrences of any digit from the output of the first function.

remove non-numeric characters in a column (character varying), postgresql (9.3.5)

I need to remove non-numeric characters in a column (character varying) and keep numeric values in postgresql 9.3.5.
Examples:
1) "ggg" => ""
2) "3,0 kg" => "3,0"
3) "15 kg." => "15"
4) ...
There are a few problems, some values are like:
1) "2x3,25"
2) "96+109"
3) ...
These need to remain as is (i.e when containing non-numeric characters between numeric characters - do nothing).
Using regexp_replace is more simple:
# select regexp_replace('test1234test45abc', '[^0-9]+', '', 'g');
regexp_replace
----------------
123445
(1 row)
The ^ means not, so any character that is not in the range 0-9 will be replaced with an empty string, ''.
The 'g' is a flag that means all matches will be replaced, not just the first match.
For modifying strings in PostgreSQL take a look at The String functions and operators section of the documentation. Function substring(string from pattern) uses POSIX regular expressions for pattern matching and works well for removing different characters from your string.
(Note that the VALUES clause inside the parentheses is just to provide the example material and you can replace it any SELECT statement or table that provides the data):
SELECT substring(column1 from '(([0-9]+.*)*[0-9]+)'), column1 FROM
(VALUES
('ggg'),
('3,0 kg'),
('15 kg.'),
('2x3,25'),
('96+109')
) strings
The regular expression explained in parts:
[0-9]+ - string has at least one number, example: '789'
[0-9]+.* - string has at least one number followed by something, example: '12smth'
([0-9]+.\*)* - the string similar to the previous line zero or more times, example: '12smth22smth'
(([0-9]+.\*)*[0-9]+) - the string from the previous line zero or more times and at least one number at the end, example: '12smth22smth345'

Get substring into a new column

I have a table that contains a column that has data in the following format - lets call the column "title" and the table "s"
title
ab.123
ab.321
cde.456
cde.654
fghi.789
fghi.987
I am trying to get a unique list of the characters that come before the "." so that i end up with this:
ab
cde
fghi
I have tried selecting the initial column into a table then trying to do an update to create a new column that is the position of the dot using "ss".
something like this:
t: select title from s
update thedot: (title ss `.)[0] from t
i was then going to try and do a 3rd column that would be "N" number of characters from "title" where N is the value stored in "thedot" column.
All i get when i try the update is a "type" error.
Any ideas? I am very new to kdb so no doubt doing something simple in a very silly way.
the reason why you get the type error is because ss only works on string type, not symbol. Plus ss is not vector based function so you need to combine it with each '.
q)update thedot:string[title] ss' "." from t
title thedot
---------------
ab.123 2
ab.321 2
cde.456 3
cde.654 3
fghi.789 4
There are a few ways to solve your problem:
q)select distinct(`$"." vs' string title)[;0] from t
x
----
ab
cde
fghi
q)select distinct(` vs' title)[;0] from t
x
----
ab
cde
fghi
You can read here for more info: http://code.kx.com/q/ref/casting/#vs
An alternative is to make use of the 0: operator, to parse around the "." delimiter. This operator is especially useful if you have a fixed number of 'columns' like in a csv file. In this case where there is a fixed number of columns and we only want the first, a list of distinct characters before the "." can be returned with:
exec distinct raze("S ";".")0:string title from t
`ab`cde`fghi
OR:
distinct raze("S ";".")0:string t`title
`ab`cde`fghi
Where "S " defines the types of each column and "." is the record delimiter. For records with differing number of columns it would be better to use the vs operator.
A variation of WooiKent's answer using each-right (/:) :
q)exec distinct (` vs/:x)[;0] from t
`ab`cde`fghi