kdb: check whether a symbol starts with a particular prefix - kdb

Given a symbol, how to check whether it has a particular prefix?
I had below code. It checks if a symbol begins with aaaaa but returns 1b for aaa which is wrong. I can add a length check but that seems verbose. Is there a cleaner way?
{"aaaaa"~-5#string x}[`$"aaa"]

Could you use like?
q)`aaa like "aaa*"
1b
q)`aaa like "aaaaa*"
0b

It seems like the issue is with "take" since "aaa" is shorter than 5. It's extending "aaa" by 2/3 of itself in order to meet that length.
You could modify your function so you have the following:
q){"aaaaa"~(x) til 5}["aaa"]
0b
q){"aaaaa"~(x) til 5}["aaaaaaaa"]
1b

Expanding on Matthew's answer if you want to make a function out of it do the following:
q)f:{x like "aaaaa*"}
q)f[`aaa]
0b
q)f[`aaaaa]
1b
q)f[`aaaaabcde]
1b
And if you want to make it more dynamic you could add a second variable for the matching prefix.
q)f2[`aaa;"aaa"]
1b
q)f2:{x like y,"*"}
q)f2[`aaa;"aaaaa"]
0b
q)f2[`aaa;"aaa"]
1b
Let me know if you see any issues.

Related

How can I run a list of functions on an input?

q) ({2*x};{3*x})
How can I apply the list of functions to an input, e.g. 4, something like:
({2*x};{3*x})[4]
8 12
You should be able to use apply (#) each left (\:)
({2*x};{3*x})#\:4
Alternative approach to apply each left
q)({2*x};{3*x})[;4]
8 12
Just to generalise Michaels answer, if your function takes more than one input/parameter then you'd need to use dot-apply (.) rather than #. Dot-apply would work in both cases using:
q)({2*x};{3*x}).\:(),4
8 12
q)({y+2*x};{y+3*x}).\:(),4 100
108 112

Can raku avoid this Malformed UTF-8 error?

When I run this raku script...
my $proc = run( 'tree', '--du', :out);
$proc.out.slurp(:close).say;
I get this error on MacOS...
Malformed UTF-8 near bytes ef b9 5c
... instead of something like this tree output from zsh which is what I want...
.
├── 00158825_20210222_0844.csv
├── 1970-Article\ Text-1971-1-2-20210118.docx
├── 1976-Article\ Text-1985-1-2-20210127.docx
├── 2042-Article\ Text-2074-1-10-20210208.pdf
├── 2045-Article\ Text-2076-1-10-20210208.pdf
├── 6.\ Guarantor\ Form\ (A).pdf
I have tried slurp(:close, enc=>'utf8-c8') and the error is the same.
I have also tried...
shell( "tree --du >> .temp.txt" );
my #lines = open(".temp.txt").lines;
dd #lines;
... and the error is the same.
Opening .temp.txt reveals this...
.
â<94><9c>â<94><80>â<94><80> [ 1016739] True
â<94><9c>â<94><80>â<94><80> [ 9459042241] dir-name
â<94><82>   â<94><9c>â<94><80>â<94><80> [ 188142] Business
â<94><82>   â<94><82>   â<94><9c>â<94><80>â<94><80> [ 9117] KeyDates.xlsx
â<94><82>   â<94><82>   â<94><9c>â<94><80>â<94><80> [ 13807] MondayNotes.docx
file -I gives this...
.temp.txt: text/plain; charset=unknown-8bit
Any advice?
[this is Catalina 10.15.17, Terminal encoding Unicode(UTF-8)
Welcome to 𝐑𝐚𝐤𝐮𝐝𝐨™ v2020.10.
Implementing the 𝐑𝐚𝐤𝐮™ programming language v6.d.
Built on MoarVM version 2020.10.]
It seems like you have a codepage/locale that is not Utf8. (Or tree is ignoring the codepage and using something different.)
A quick … get something, anything out of it; is to use an 8-bit single-byte encoding.
run( 'tree', '--du', :out, :enc<latin1> );
It generally is enough to see where decoding starts to go wrong with Utf8.
That said, let's look at your expected output, and the file output.
say '├──'.encode; # utf8:0x<E2 94 9C E2 94 80 E2 94 80>
In your file you have
â<94><9c>â<94><80>â<94><80> [ 1016739] True
Wait …
say 'â'.encode('latin1'); # Blob[uint8]:0x<E2>
<E2><94><9c><E2><94><80><E2><94><80>
<E2 94 9c E2 94 80 E2 94 80>
utf8:0x<E2 94 9C E2 94 80 E2 94 80>
Yeah, those look an awful lot alike.
In that they are exactly the same.
So it does appear to be producing the expected output to some extent.
Which seems to confirm, that yes there is an encoding problem in-between tree and your code. That indicates that the codepage/locale is set wrong.
You haven't really provided enough information to figure out exactly what's going wrong where.
You should have used run in binary mode to give us the exact output.
say run('echo', 'hello', :out, :bin).out.slurp;
# Buf[uint8]:0x<68 65 6C 6C 6F 0A>
You also didn't say if <9c> is literally in the file as four text characters, or if it is a feature of whatever you used to open the file turning binary data into text.
It also would be nice if all of the example data was of the same thing.
On a slightly related note…
Since tree gives filenames, and filenames are not Unicode, using utf8-c8 is appropriate here.
(Same generally goes for usernames and passwords.)
Here's some code that I ran on my computer to hopefully show why.
say dir(:test(/^ r.+sum.+ $/)).map: *.relative.encode('utf8-c8').decode
# (résumé résumé résumé résumé)
dir(:test(/^ r.+sum.+ $/)).map: *.relative.encode('utf8-c8').say
# Blob[uint8]:0x<72 65 CC 81 73 75 6D 65 CC 81>
# Blob[uint8]:0x<72 C3 A9 73 75 6D 65 CC 81>
# Blob[uint8]:0x<72 C3 A9 73 75 6D C3 A9>
# Blob[uint8]:0x<72 65 CC 81 73 75 6D C3 A9>
say 'é'.NFC;
# NFC:0x<00e9>
say 'é'.NFD
# NFD:0x<0065 0301>
sub to-Utf8 ( Uni:D $_ ){
.map: *.chr.encode
}
say to-Utf8 'é'.NFC
# (utf8:0x<C3 A9>)
say to-Utf8 'é'.NFD
# (utf8:0x<65> utf8:0x<CC 81>)
So é is either encoded as one composed codepoint <C3 A9> or two decomposed codepoints <65> <CC 81>.
Did I really create 4 files with the “same name” just for this purpose?
Yes. Yes I did.
Update I had deleted this nanswer because Brad's excellent answer and Valle Lukas's spot on comment seemed to render it moot. Then #p6steve confirmed both Brad's answer and Valle Lukas's solutions worked for them, so all the more reason to keep it deleted. But too late! A mistake in my nanswer had misled #p6steve who made a similar mistake in a follow up SO. Wea Culpa. To atone for my sins, I'm now permanently undeleting and leaving my shameful past for all to see.
This is a nanswer. I don't know Mac, but do love investigation, and what I've got to say won't fit in the comments.
Update The 'find .' in the following should be 'find', '.'. See run doc.
What do you get with this?:
say .out.lines given run 'find .', :out
If find . works, the problem is presumably tree.
If find . doesn't work, then try something really simple, that's built into MacOS, something that really should work. If it doesn't work, then the problem isn't tree but something more basic.
Malformed UTF-8 near bytes ef b9 5c
That means Raku was expecting UTF-8 but the input wasn't UTF-8.
Translating the message from computerese into English:
The supposedly English string "[Linux] xshell远程登陆CentOS时中文乱码解决_Cindy的博客 ..." is Malformed near 远程登.
In other words, the tree command is not generating UTF-8.
(Therefore using utf8-c8 will almost certainly be useless in the first instance. Its purpose is to cheat. It's for when text is either almost all UTF-8 except for a handful of rogue bytes, and you can't be bothered to sort out the input, or when you have absolutely no choice but to accept the input as it is and still want to muddle through. But in this case you surely ought either sort the problem out by getting to the bottom of things, or find some alternative to tree.)
Terminal encoding Unicode(UTF-8)
A google for "Terminal encoding Unicode(UTF-8)" yields just 7 matches. None appeared to be exact matches for "Terminal encoding Unicode(UTF-8)". All but one look to me like ... ef b9 5c looks to Rakudo. :)
If you copy/pasted that string, where did you copy it from?
If you yourself wrote that string, why were you so sure MacOS really was encoding tree's output as UTF-8 when run via the kernel (not a shell) that you wrote that it was?
run doesn't use a shell.
The current doc claims shell uses /bin/sh -c on MacOS.
What's the output of this?:
readlink -e $(which sh)
Is the output zsh?
If so sh -c should be using it.
If not, that may be the problem.
When one uses shell, one has to ensure the passed string is appropriately quoted and escaped. What do you get when you try these?:
say .out.lines given shell "'find .'", :out;
say .out.lines given shell "'tree --du'", :out;
What exactly is tree invoking? Is it a shell alias in zsh? If it's a binary, where did you install it from and how did you configure it, especially in terms of influencing zsh's handling of encodings?

would REGEXP or TRANSLATE in DB2 another choice for REPLACE?

What would be the best way to shorten below SQL code?
REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(REPLACE(TRIM(MYFIELD),'-R1',''),'-R2',''),'-R3',''),'-R4',''),'-R5',''),'-R6',''),'-R7',''),'-R8',''),'-R9',''),'-RA',''),'-RB',''),'-RC',''),'-RD',''),'-RE',''),'-RF','') AS TESTFIELD
here is what I have tried:
REGEXP_REPLACE(MYFIELD,'-R[0-100][a-fA-F]','')
Original Data
N-RX ABCD
GROUP OPTION -01
ADVANTAGE 65 SELECT B-R11
ADVANTAGE 65 SELECT B-RA
ADVANTAGE 65 SELECT B-R09
ADVANTAGE 65 SELECT B-RB
ADVANTAGE 65 SELECT B/2A
Result Needed:
N-RX ABCD
GROUP OPTION -01
ADVANTAGE 65 SELECT B
Solution:
REGEXP_REPLACE(Trim(MyField), '[-|/]R[0-9a-zA-Z*][0-9a-zA-Z*]*$', '')
Your regular expression is your current issue. Try something like:
REGEXP_REPLACE(DACL_PDLV_5_DE, '-R[0-9a-fA-F][0-9]*$', '')
This matches '-R' followed by a digit or a-f or A-F, optionally followed by another digit, but only at the end of the string.
If you could have a two-digit hex value you will want to adjust accordingly.

Can't get the written file content in q?

I've copy the exact example in q for mortals as follows:
q)h:hopen `:D:/q4m/raw
q)h[42]
548i
q)h 10 20 30
548i
q)hclose h
q)get `:D:/q4m/raw
'D:/q4m/raw
[0] get `:D:/q4m/raw
Look into the directory, the file was created there. Why can't I get it?
Instead, if I do:
q)h:hopen `:D:/q4m/L
q)h[42]
628i
q)h[10 20 30]
628i
q)hclose h
q)get `:D:/q4m/L
0 1 2 3 4 42 10 20 30
Things get normal, why?
After testing the given code I believe your issue may be in how you intialise the file.
I assume in the code that works that you use some variation of
`:D:/q4m/L set til 5
before.
However this is not done for
`:D:/q4m/raw
If you were to use
`:D:/q4m/raw set til 5
or alternatively
.[`:D:/q4m/raw;();:;()]
beforehand then the first set of code will work.
Additionally, if we look at the binary using
read1 `:D:/q4m/raw
and
read1 `:D:/q4m/L
and the output does not include 07 near the beginning then it is not being recognised as a proper kdb list. That is, hopen simply appends to the binary file instead of amending it. (If you notice the 05 byte that indicates length of the list, this doesn't increase when you add via the handle).
eg.
The first method you get
q)read1 `:D:/q4m/raw
0x2a000000000000000a0000000000000014000000000000001e00000000000000
which dosen't really mean anything in q.
The second method gives
q)read1 `:D:/q4m/L
0xfe2007000000000005000000000000000000000000000000010000000000000002000000000..
which is a proper kdb list (notice the 07 which indicates type).
If you wish to instead just read in /q4m/raw then I suggest setting an empty list, hopen to that list and pass it `:D:/q4m/raw as follows
q)`:empty set 0#0
`:empty
q)h:hopen `:empty
q)h read1 `:D:/q4m/raw
3i
q)get `:empty
42 10 20 30
This will only work if all entries are the same type.

Perl print $(^b)

When I found Perl's $^O, I was curious whether there are more variables like this, because ^ reminded me of a regular expression. When I enter
print "$(^b)";
it comes up with some numbers:
1000 81 90 91 92 93 100 150 1000
What to these mean? Is this some kind of 0xdeadbeef?
I think you are just printing out the value of $(.
The real gid of this process. If you are on a machine that supports membership in multiple groups simultaneously, gives a space separated list of groups you are in. The first number is the one returned by getgid() , and the subsequent ones by getgroups() , one of which may be the same as the first number.
However, a value assigned to $( must be a single number used to set the real gid. So the value given by $( should not be assigned back to $( without being forced numeric, such as by adding zero. Note that this is different to the effective gid ($) ) which does take a list.
You can change both the real gid and the effective gid at the same time by using POSIX::setgid() . Changes to $( require a check to $! to detect any possible errors after an attempted change.
Here is the comparison:
diff <(perl -le 'print "$(";') <(perl -le 'print "$(^b)";')
1c1
< 20 20 402 12 33 61 79 80 81 98 100 204 401
---
> 20 20 402 12 33 61 79 80 81 98 100 204 401^b)
See the documentation on perldoc perlvar for a list of all the various built-in variables (along with their use English; equivalent names).