Python Polars - Finding min value greater than col(a) in col(b) where col(a) is a numeric and col(b) is a column of lists of numerics - python-polars

How do I convert the following slow operation in pandas to a fast operation in polars?
df.to_pandas().apply(lambda x: pd.cut([x['ingame_timestamp']], list(x['time_bins']), list(x['time_bins'])[1:]), axis=1)
Assume ingame_timestamp is a float and time_bins is a list.
I basically want to be able to do something like:
df.with_columns(pl.cut(value=pl.col('val'), bins=pl.col('time_bins), labels=pl.col('time_bins')[1:]).alias('val_time_bin'))
The above code works when I use to_pandas() but obviously this loses a bunch of the speed benefits of using polars and not using apply.
The following gives you an example data frame along with a column which is the desired output:
example_df = pl.DataFrame({'values': [0,1,2], 'time_bins': [[-1, -0.5, 0.5, 1], [0, 0.5, 1.5, 2.5], [1.5, 2.5, 3, 4.5]], 'value_time_bin': [0.5, 1.5, 2.5]})
It is sufficient to find the minimum value greater than "value" in the list "time_bins".

# reproducible dataset
df = pl.DataFrame({
'values': [0, 1, 2],
'time_bins': [[-1., -0.5, 0.5, 1.], [0., 0.5, 1.5, 2.5], [1.5, 2.5, 3, 4.5]]
})
If I understood you right, you need to create column with min value from time_bins that is greater than value in values.
One way to do it:
df.explode("time_bins").groupby("values").agg([
pl.col("time_bins").list(),
pl.col("time_bins").filter(
pl.col("time_bins") > pl.col("values")
).min().alias("value_time_bin")
])
┌────────┬───────────────────────┬────────────────┐
│ values ┆ time_bins ┆ value_time_bin │
│ --- ┆ --- ┆ --- │
│ i64 ┆ list[f64] ┆ f32 │
╞════════╪═══════════════════════╪════════════════╡
│ 0 ┆ [-1.0, -0.5, ... 1.0] ┆ 0.5 │
│ 1 ┆ [0.0, 0.5, ... 2.5] ┆ 1.5 │
│ 2 ┆ [1.5, 2.5, ... 4.5] ┆ 2.5 │
└────────┴───────────────────────┴────────────────┘

Related

How would I go about counting the amount of each alphanumerical in an array? (APL)

I can't figure out how to take a matrix and count the amount of the alphanumerical values for each row. I will only be taking in matrices with the values I'm counting.
For example, if I got:
ABA455
7L9O36G
DZLFPEI
I would get something like A:2 B:1 4:1 5:2 for the first row and each row would be counted independently.
I would most like to understand the operators used if you could please explain them too.
Thank you.
The following should work in any mainstream APL implementation.
Let's start with a simple vector of characters:
m ← 3 7⍴'ABA455 7L9O36GDZLFPEI'
v ← m[1;]
v
ABA455
We can find the unique characters by filtering to keep only elements that have the same index as the first occurrence of themselves:
v ⍳ v
1 2 1 4 5 5 7
⍳ ⍴ v
1 2 3 4 5 6 7
( v ⍳ v ) = ⍳ ⍴ v
1 1 0 1 1 0 1
⎕ ← unique ← ( (v ⍳ v) = ⍳ ⍴ v ) / v
AB45
Now we compare the unique elements to every element:
unique ∘.= v
1 0 1 0 0 0 0
0 1 0 0 0 0 0
0 0 0 1 0 0 0
0 0 0 0 1 1 0
0 0 0 0 0 0 1
Summing this table horizontally gives us the occurrence count for each unique element:
+/ unique ∘.= v
2 1 1 2 1
Now we just need to pair up the unique elements with their respective counts:
unique ,[1.5] +/ unique ∘.= v
A 2
B 1
4 1
5 2
1
Let's put that into a utility function:
∇ c ← Counts v; u
u ← ( (v ⍳ v) = ⍳ ⍴ v ) / v
c ← u ,[1.5] +/ u ∘.= v
∇
Counts v
A 2
B 1
4 1
5 2
1
Now we need to apply this function on each row of the matrix. We start by splitting the matrix into a vector of vectors:
⊂[2] m
┌───────┬───────┬───────┐
│ABA455 │7L9O36G│DZLFPEI│
└───────┴───────┴───────┘
Then we apply the utility function to each vector:
Counts¨ ⊂[2] m
┌───┬───┬───┐
│A 2│7 1│D 1│
│B 1│L 1│Z 1│
│4 1│9 1│L 1│
│5 2│O 1│F 1│
│ 1│3 1│P 1│
│ │6 1│E 1│
│ │G 1│I 1│
└───┴───┴───┘
Try it online!
If you're using Dyalog APL, then the Key operator (⌸) is very much what you need:
{⍺ ⍵}⌸ 'ABA455'
┌─┬───┐
│A│1 3│
├─┼───┤
│B│2 │
├─┼───┤
│4│4 │
├─┼───┤
│5│5 6│
└─┴───┘
It takes a single operand and calls it once per unique value, with the specific value as left argument and the list of occurrence indices as right argument. However, we're not interested in the actual occurrences, only in their counts:
{⍺ (≢⍵)}⌸ 'ABA455'
A 2
B 1
4 1
5 2
Now we simply have to apply this function on each row. We can do this by splitting the matrix and applying the function with Each:
{⍺ (≢⍵)}⌸¨ ↓ m
┌───┬───┬───┐
│A 2│7 1│D 1│
│B 1│L 1│Z 1│
│4 1│9 1│L 1│
│5 2│O 1│F 1│
│ 1│3 1│P 1│
│ │6 1│E 1│
│ │G 1│I 1│
└───┴───┴───┘
Try it online!

Outerjoin is not merging as expected, are my specifications wrong?

I am trying to merge three tables using outerjoin() but I am not getting the result I want/expect. Below is the code I am using, the result I am getting, and the result I want. Using Matlab R2018a.
Code
%%% set up dummy data tables
Key1 = [1 1 1 2 2 3 3 3 3 3];
Key2 = [1 2 3 1 2 1 2 3 4 5];
Val1 = [0 NaN NaN 0 NaN 0.09 NaN NaN NaN NaN];
Val2 = [NaN 0.55 0.55 0.04 0.04 0.58 0.634 0.668 0.6950 0.7560];
mytable = array2table([Key1', Key2', Val1', Val2']);
mytable.Properties.VariableNames = {'Key1', 'Key2', 'Val1', 'Val2'};
temp1 = array2table([1 4 0; 2 3 0; 3 6 0.09]);
temp1.Properties.VariableNames = {'Key1', 'Key2', 'Val1'};
temp2 = array2table([1 4 0.55; 2 3 0.04; 3 6 0.07560]);
temp2.Properties.VariableNames = {'Key1', 'Key2', 'Val2'};
%%% try to join mytable, temp1, and temp2 together
Tout = outerjoin(mytable, temp1, 'MergeKeys', true);
Tout = outerjoin(Tout, temp2, 'MergeKeys', true);
Result from code
I want the highlight rows to be combined, such that the Key1-Key2 pair is not duplicated in the output table. I tried various combinations of ...'MergeKeys', true, 'LeftVariables', {'Key1', 'Key2', 'Val1', 'Val2'}, 'RightVariables', {'Key1', 'Key2', 'Val2'} etc. but I couldn't get it to work.
Desired result
Solved by reversing the order:
Tout = outerjoin(temp1, temp2, 'MergeKeys',true);
Tout = outerjoin(mytable, Tout, 'MergeKeys',true);

Calling text file

if i have text file that has three column say
1 2 1
3 1 1
2 3 1
and also have a matrix s =
[0.3 0.4 0.6
0.1 0.5 0.7
0.2 0.11 0.9]
firstly:
with respect to text file, i want to consider first column as i and second column as j then if the third column equal 1 then put its corresponding value in matrix s in new array say A else put remaining value in matrix s in new another array say B.
i.e i want this result
A=[0.4, 0.2, 0.7] B=[0.3, 0.6, 0.1, 0.5, 0.11, 0.9]
coordinates = [1 2 1
3 1 1
2 3 1];
s = [0.3 0.4 0.6
0.1 0.5 0.7
0.2 0.11 0.9];
linindices = sub2ind(size(s), coordinates(:, 1), coordinates(:, 2))';
A = s(linindices)
B = s(setdiff(1:numel(s), linindices))

How to remove 'u'(unicode) from all the values in a column in a DataFrame?

How does one remove 'u'(unicode) from all the values in a column in a DataFrame?
table.place.unique()
array([u'Newyork', u'Chicago', u'San Francisco'], dtype=object)
>>> df = pd.DataFrame([u'c%s'%i for i in range(11,21)], columns=["c"])
>>> df
c
0 c11
1 c12
2 c13
3 c14
4 c15
5 c16
6 c17
7 c18
8 c19
9 c20
>>> df['c'].values
array([u'c11', u'c12', u'c13', u'c14', u'c15', u'c16', u'c17', u'c18',
u'c19', u'c20'], dtype=object)
>>> df['c'].astype(str).values
array(['c11', 'c12', 'c13', 'c14', 'c15', 'c16', 'c17', 'c18', 'c19', 'c20'], dtype=object)
>>>

How to exclude files under conf folder for distribution?

I have a application.dev.conf and application.test.conf under my conf folder in my Play 2.3 application but I do not want it to be packaged as part of my distribution? What is the right excludeFilter for it?
Actually lpiepiora's answer will do the trick, however note that filtering on mappings in Universal will only exclude application.dev.conf from the conf folder and NOT from the jar itself.
I don't know about the play framework but in general if you have something like this:
hello
├── src
│ └── main
│ ├── scala
│ │ └── com.world.hello
│ │ └── Main.scala
│ ├── resources
│ │ ├── application.dev.conf
│ │ └── application.conf
Doing:
mappings in (Universal, ) ++= {
((resourceDirectory in Compile).value * "*").get.filterNot(f => f.getName.endsWith(".dev.conf")).map { f =>
f -> s"conf/${f.name}"
}
}
would produce the following package structure:
hello/
├── lib
│ └── com.world.hello-1234-SNAPSHOT.jar
├── conf
│ └── application.conf
However if you look into the jar, you will see that your dev.conf file is still in there:
$ unzip -v com.world.hello-1234-SNAPSHOT.jar
Archive: com.world.hello-1234-SNAPSHOT.jar
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
371 Defl:N 166 55% 10-01-2018 15:20 36c30a78 META-INF/MANIFEST.MF
0 Stored 0 0% 10-01-2018 15:20 00000000 com/
0 Stored 0 0% 10-01-2018 15:20 00000000 com/world/
0 Stored 0 0% 10-01-2018 15:20 00000000 com/world/hello/
0 Stored 0 0% 10-01-2018 15:20 00000000 com/world/hello/
13646 Defl:N 4361 68% 10-01-2018 12:06 7e2dce2f com/world/hello/Main$.class
930 Defl:N 445 52% 10-01-2018 13:57 5b180d92 application.conf
930 Defl:N 445 52% 10-01-2018 13:57 5b180d92 application.dev.conf
This is actually not really harmful but if you want to remove them too, here is the answer: How to exclude resources during packaging with SBT but not during testing
mappings in (Compile, packageBin) ~= { _.filter(!_._1.getName.endsWith(".dev.conf")) }
You could use mappings to exclude both files.
mappings in Universal := {
val origMappings = (mappings in Universal).value
origMappings.filterNot { case (_, file) => file.endsWith("application.dev.conf") || file.endsWith("application.test.conf") }
}
Does an excludeFilter as follows work for you?
excludeFilter in Universal in unmanagedResources := "application.dev.conf" || "application.test.conf"
(The unmanagedResourceDirectories key refers to conf/ by default.)