KDB: Loop over 2 lists and filter/replace - kdb

In KDB+ I have 2 lists of equal size, where I want to replace(probably use ssr ) strings in list1 conditioned on the values in list2.
list1:("Boy";"Toy";"Coy";"Poy")
list2:("A","B","A","B")
If list2[i]=="B" replace "oy" by "ab"
So we should finally get
list1:("Boy";"Tab";"Coy";"Pab")

q)list1:("Boy";"Toy";"Coy";"Poy");
q)list2:"ABAB";
q)#[list1;where list2="B";ssr[;"oy";"ab"]]
"Boy"
"Tab"
"Coy"
"Pab"

The code below should achieve what you want:
{#[`list1;x;{(-2_x),"ab"}]} where list2="B"
Output:
q){#[`list1;x;{(-2_x),"ab"}]} where list2="B"
`list1
q)list1
"Boy"
"Tab"
"Coy"
"Pab"

Related

how do we iterate and store the results in a variable in kdb

I have a string say example "https://www.google.com" and a count of paging say 5
how do i iterate the URL to append p/page=incrementing numbers of paging and store the result in a variable as list?
"https://www.google.com/page=1"
"https://www.google.com/page=2"
"https://www.google.com/page=3"
"https://www.google.com/page=4"
"https://www.google.com/page=5"
So the end result will look like this having a variable query_var which will hold a list of string example below
query_var:("https://www.google.com/page=1";"https://www.google.com/page=2";"https://www.google.com/page=3";"https://www.google.com/page=4";"https://www.google.com/page=5");
count query_var \\5
You can use the join function , with the each-right adverb /:
query_var: "https://www.google.com/page=" ,/: string 1_til 6
Make it a function to support a varied count of pages:
f:{"https://www.google.com/page=" ,/: string 1_til x+1}
q)10=count f[10]
1b
Q1
You don’t even need a lambda to make this a function. It’s a sequence of unary functions, so you can compose them.
q)f: "https://www.google.com/page=",/: string ::
q)f 1+til 6
"https://www.google.com/page=1"
"https://www.google.com/page=2"
"https://www.google.com/page=3"
"https://www.google.com/page=4"
"https://www.google.com/page=5"
"https://www.google.com/page=6"
Q2
q)("https://www.google.com";;"info/history")
enlist["https://www.google.com";;"info/history"]
q)"/"sv'("https://www.google.com";;"info/history")#/: string `aapl`msft`nftx
"https://www.google.com/aapl/info/history"
"https://www.google.com/msft/info/history"
"https://www.google.com/nftx/info/history"
List notation is syntactic sugar for enlist. The list with a missing item is a projection of enlist and can be iterated.
Again, a sequence of unaries is all composable without a lambda:
q)g: "/"sv'("https://www.google.com";;"info/history")#/: string ::
q)g `aapl`msft`nftx
"https://www.google.com/aapl/info/history"
"https://www.google.com/msft/info/history"
"https://www.google.com/nftx/info/history"
If you can assume a unique character that doesn't appear elsewhere in your url (e.g. #) then ssr is a simple and easily-readable approach:
ssr["https://www.google.com/page=#";"#";]each string 1+til 5
ssr["https://www.google.com/#/info/history";"#";]each string`aapl`msft`nftx

XSLT Substring not woring [duplicate]

Let's say I have this string: "123_12345_123456"
I would like to extract everything before the second "_" (underscore)
I tried:
fn:tokenize("123_1234_12345", '_')[position() le 2]
That returns:
123
1234
What I actually want is:
123_1234
How do I achieve that?
I am using XQuery 1.0
Regular expressions are flexible and compact:
replace('123_1234_12345', '_[^_]+$', '')
Another solution that may be better readable is to a) tokenize the string, b) keep the tokens you want to preserve and c) join them again:
string-join(
tokenize('123_1234_12345', '_')[position() = 1 to 2],
'_'
)
Taking the basic idea from Michael Kay's deleted answer, it could be implemented like this:
substring($input, 1, index-of(string-to-codepoints($input), 95)[2] - 1)

PySpark list() in withColumn() only works once, then AssertionError: col should be Column

I have a DataFrame with 6 string columns named like 'Spclty1'...'Spclty6' and another 6 named like 'StartDt1'...'StartDt6'. I want to zip them and collapse into a columns that looks like this:
[[Spclty1, StartDt1]...[Spclty6, StartDt6]]
I first tried collapsing just the 'Spclty' columns into a list like this:
DF = DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6')))
This worked the first time I executed it, giving me a new column called 'Spclty' containing rows such as ['014', '124', '547', '000', '000', '000'], as expected.
Then, I added a line to my script to do the same thing on a different set of 6 string columns, named 'StartDt1'...'StartDt6':
DF = DF.withColumn('StartDt', list(DF.select('StartDt1', 'StartDt2', 'StartDt3', 'StartDt4', 'StartDt5', 'StartDt6'))))
This caused AssertionError: col should be Column.
After I ran out of things to try, I tried the original operation again (as a sanity check):
DF.withColumn('Spclty', list(DF.select('Spclty1', 'Spclty2', 'Spclty3', 'Spclty4', 'Spclty5', 'Spclty6'))).collect()
and got the assertion error as above.
So, it would be good to understand why it only worked the first time (only), but the main question is: what is the correct way to zip columns into a collection of dict-like elements in Spark?
.withColumn() expects a column object as second parameter and you are supplying a list.
Thanks. After reading a number of SO posts I figured out the syntax for passing a set of columns to the col parameter, using struct to create an output column that holds a list of values:
DF_tmp = DF_tmp.withColumn('specialties', array([
struct(
*(col("Spclty{}".format(i)).alias("spclty_code"),
col("StartDt{}".format(i)).alias("start_date"))
)
for i in range(1, 7)
]
))
So, the col() and *col() constructs are what I was looking for, while the array([struct(...)]) approach lets me combine the 'Spclty' and 'StartDt' entries into a list of dict-like elements.

What does this mean in Perl 1..$#something?

I have a loop for example :
for my $something ( #place[1..$#thing] ) {
}
I don't get this statement 1..$#thing
I know that # is for comments but my IDE doesn't color #thing as comment. Or is it really just a comment for someone to know that what is in "$" is "thing" ? And if it's a comment why was the rest of the line not commented out like ] ) { ?
If it has other meanings, i will like to know. Sorry if my question sounds odd, i am just new to perl and perplexed by such an expression.
The $# is the syntax for getting the highest index of the array in question, so $#thing is the highest index of the array #thing. This is documented in perldoc perldata
.. is the range operator, and 1 .. $#thing means a list of numbers, from 1 to whatever the highest index of #thing is.
Using this list inside array brackets with the # sigill denotes that this is an array slice, which is to say, a selected number of elements in the #place array.
So assuming the following:
my #thing = qw(foo bar baz);
my #place = qw(home work restaurant gym);
then #place[1 .. $#thing] (or 1 .. 2) would expand into the list work, restaurant.
It is correct that # is used for comments, but not in this case.
it's how you define a range. From starting value to some other value.
for my $something ( #place[1..3] ) {
# Takes the first three elements
}
Binary ".." is the range operator, which is really two different
operators depending on the context. In list context, it returns a list
of values counting (up by ones) from the left value to the right
value. If the left value is greater than the right value then it
returns the empty list. The range operator is useful for writing
foreach (1..10) loops and for doing slice operations on arrays. In the
current implementation, no temporary array is created when the range
operator is used as the expression in foreach loops, but older
versions of Perl might burn a lot of memory when you write something
like this:
http://perldoc.perl.org/perlop.html#Range-Operators

Perl : map statement

I would like your help, because I am not able to understand what the following line means:
map {#$_[1 .. 4]} #msft
found in the example code of GD::Graph::ohlc.
Could you please provide me with a hint?
Thank you.
#msft is an array of arrays where each inner array contains 5 items (date, open/low/high/close prices).
The map takes each element of #msft, which is an array reference stored in $_ and dereferences it via #$_ and takes a slice of that array (namely the second through fifth items since the array is 0-based) via the [1..4]. It then returns those four items. map concatenates them into a single list.
In essence, it is flattening the array of arrays of five elements into a single array made up of the 2nd through 5th items of each subarray.
The elements of #msft are array references. The code collects elements 1 through 4 from each array into a single list:
my #msft = (
[0,1,2,3,4,5],
[0,11,22,33,44,55],
[0,111,222,333,444,555],
);
my #result = map {#$_[1 .. 4]} #msft;
print "#result\n"; # 1 2 3 4 11 22 33 44 111 222 333 444
From the documentation for map:
Evaluates the BLOCK or EXPR for each
element of LIST (locally setting $_ to
each element) and returns the list
value composed of the results of each
such evaluation.