I want to know how to dispatch an command on ATOM.
I tried with this code:
activate: (state) ->
#subscriptions = new CompositeDisposable
#subscriptions.add atom.commands.add "atom-workspace",
"activate-killer-instinct-mode:test-command": =>
console.log "Hello, I'm your commands"
#error = false
target = "atom-workspace"
commandName = "activate-killer-instinct-mode:test-command"
atom.commands.dispatch target, commandName #Here is the error
console.log "The command " + commandName + " got dispatched" if not #error
console.error "The command " + commandName + " didn't get dispatched" if #error
I got this output:
The command activate-killer-instinct-mode:test-command didn't get dispatched
I read the documentation.
::dispatch(target, commandName)
╭─────────────╥─────────────────────────────────────────╮
│ Argument ║ Description │
╞═════════════╬═════════════════════════════════════════╡
│ target ║ The DOM node at which to start bubbling │
| ║ the command event. │
╞═════════════╬═════════════════════════════════════════╡
│ commandName ║ String indicating the name of the │
| ║ command to dispatch. |
└─────────────╨─────────────────────────────────────────┘
I think the error is in the target, I don't know how to do it. Thanks for the help.
Related
Polars: 0.16.2
Python: 3.11.1
Windows 10
Attempting to filter a column using a time range via .is_between()
Couldn't find anything on StackOverflow, but found (maybe?) something similar in the github issues (but it's been solved): https://github.com/pola-rs/polars/issues/5236
To reproduce
import polars as pl
from datetime import time
df = pl.date_range(low=datetime(2023, 2, 7), high=datetime(2023, 2, 8), interval="30m", name="date").to_frame()
# Attempt to filter by time
df.filter(
pl.col('date').is_between(time(9, 30), time(14, 30))
)
Traceback:
PanicException Traceback (most recent call last)
Cell In[11], line 1
----> 1 df.filter(
2 pl.col('date').is_between(time(9, 30, 0, 0), time(14, 30, 0, 0))
3 )
File d:\My_Path\venv\Lib\site-packages\polars\internals\dataframe\frame.py:2747, in DataFrame.filter(self, predicate)
2741 if _check_for_numpy(predicate) and isinstance(predicate, np.ndarray):
2742 predicate = pli.Series(predicate)
2744 return (
2745 self.lazy()
2746 .filter(predicate) # type: ignore[arg-type]
-> 2747 .collect(no_optimization=True)
2748 )
File d:\My_Path\venv\Lib\site-packages\polars\internals\lazyframe\frame.py:1146, in LazyFrame.collect(self, type_coercion, predicate_pushdown, projection_pushdown, simplify_expression, no_optimization, slice_pushdown, common_subplan_elimination, streaming)
1135 common_subplan_elimination = False
1137 ldf = self._ldf.optimization_toggle(
1138 type_coercion,
1139 predicate_pushdown,
(...)
1144 streaming,
1145 )
-> 1146 return pli.wrap_df(ldf.collect())
PanicException: cannot coerce datatypes: ComputeError(Owned("Failed to determine supertype of Datetime(Microseconds, None) and Time"))
Not sure if I'm doing something wrong, or if this is a bug.
Tried to filter a series using a time range, and expected a filtered series for just those times. Instead, I got a PanicException (list above).
You are trying to filter a DateTime with a Time. You need to cast to pl.Time before doing the is_between
df.filter(
pl.col('date').cast(pl.Time).is_between(time(9, 30), time(14, 30))
)
┌─────────────────────┐
│ date │
│ --- │
│ datetime[μs] │
╞═════════════════════╡
│ 2023-02-07 10:00:00 │
│ 2023-02-07 10:30:00 │
│ 2023-02-07 11:00:00 │
│ 2023-02-07 11:30:00 │
│ 2023-02-07 12:00:00 │
│ 2023-02-07 12:30:00 │
│ 2023-02-07 13:00:00 │
│ 2023-02-07 13:30:00 │
│ 2023-02-07 14:00:00 │
└─────────────────────┘
Consider the following dataframe:
df = pl.DataFrame({
"letters": ["A", "B", "C", "D", "E", "F", "G", "H"],
"values": ["aa", "bb", "cc", "dd", "ee", "ff", "gg", "hh"]
})
print(df)
shape: (8, 2)
┌─────────┬────────┐
│ letters ┆ values │
│ --- ┆ --- │
│ str ┆ str │
╞═════════╪════════╡
│ A ┆ aa │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ B ┆ bb │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ C ┆ cc │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ D ┆ dd │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ E ┆ ee │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ F ┆ ff │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ G ┆ gg │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┤
│ H ┆ hh │
└─────────┴────────┘
How do I take a window of size +/- N around any row that satisfies a given condition? For example, the condition is pl.col("letters").contains("D|F") and N = 2. Then, the output should be:
┌─────────┬────────────────────────────────┐
│ letters ┆ output │
│ --- ┆ --- │
│ str ┆ list[str] │
╞═════════╪════════════════════════════════╡
│ D ┆ ["bb", "cc", "dd", "ee", "ff"] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ F ┆ ["dd", "ee", "ff", "gg", "hh"] │
└─────────┴────────────────────────────────┘
Note that the windows are overlapping in this case (the F window also contains dd and the D windows also contains ff). Also, note that N = 2 for the sake of simplicity here but, in reality, it'll be larger (~10 - 20). And the dataset is relatively large so I'd like to do this as efficiently as possible without exploding memory usage.
EDIT: To make the ask more explicit, here's the query in DuckDB's SQL syntax that gives the right answer (and I'd like to know how to translate it to Polars):
df_table = df.to_arrow()
con = duckdb.connect()
query = """
SELECT
letters,
list(values) OVER (
ROWS BETWEEN 2 PRECEDING
AND 2 FOLLOWING
) as combined
FROM df_table
QUALIFY letters in ('D', 'F')
"""
print(pl.from_arrow(con.execute(query).arrow()))
shape: (2, 2)
┌─────────┬────────────────────────┐
│ letters ┆ combined │
│ --- ┆ --- │
│ str ┆ list[str] │
╞═════════╪════════════════════════╡
│ D ┆ ["bb", "cc", ... "ff"] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ F ┆ ["dd", "ee", ... "hh"] │
└─────────┴────────────────────────┘
Benchmarks of suggested solutions
I ran the suggested solutions in a Jupyter notebook on one of Amazon's ml.c5.xlarge machines. While the notebook was running, I also kept htop open in a terminal to observe CPU and memory use. The dataset had 12M+ rows.
I ran both solutions via both the eager and lazy APIs. For good measure, I also tried using a simple Python for loop to extract the slices after identifying the rows of interest and also DuckDB.
Summary Table
Polars had really robust performance and judicious memory use (with the #jqurious' method) because of the clever, no-copy implementation of .shift() . Surprisingly, a well-thought out Python for loop did just as well. DuckDB had performed rather poorly in both speed and memory use.
Neither Polars nor DuckDB uses more than one core for the operation. Not sure if that's due to a lack of optimization or if this problem is just amenable to parallelization. I suppose we're only filtering over one column and then taking slices of that same column so there's not much multiple threads can do.
method
cpu use
memory use
time
ΩΠΟΚΕΚΡΥΜΜΕΝΟΣ
single core
explosion
jqurious
single core
2.53G to 2.53G
4.63 s
(smart) for loop
single core
2.53G to 2.58G
4.91 s
DuckDB
single core
1.62G to 6.13G
38.6 s
cpu use shows if multiple cores were taxes during the operation
memory use shows how much memory was being used before the operation and the maximum memory use during the operation.
#ΩΠΟΚΕΚΡΥΜΜΕΝΟΣ's solution:
preceding = 2
following = 2
look_around = [pl.col("body").shift(-i)
for i in range(-preceding, following + 1)]
(
df
.with_column(
pl.when(pl.col('body').str.contains(regex))
.then(pl.concat_list(look_around))
.alias('combined')
)
.filter(pl.col('combined').is_not_null())
)
Unfortunately, on my rather large dataset, this solution caused the memory use to explode and the kernel to crash with both the eager and lazy APIs.
#jqurious' solution
preceding = 2
following = 2
look_around = [
pl.col("body").shift(-i).alias(f"lag_{i}") for i in range(-preceding, following + 1)
]
(
df
.with_columns(
look_around
)
.filter(pl.col("body").str.contains(regex))
.select([
pl.col("body"),
pl.concat_list([f"lag_{i}" for i in range(-2, 3)]).alias("output")
])
)
eager:
cpu use: single-core
memory use: 2.53G -> 2.53G
time: 4.63 s ± 6.6 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
lazy:
cpu use: single-core
memory use: 2.53G -> 2.53G
time: 4.63 s ± 3.85 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
(Smart) Python for loop
preceding = 2
following = 2
output = []
indices = df.with_row_count().select(
pl.col("row_nr").filter(pl.col("body").str.contains(regex))
)["row_nr"]
for idx, x in enumerate(indices):
offset = max(0, x - preceding)
length = preceding + following + 1
output.append(df["body"].slice(offset, length))
cpu use: single-core
memory use: 2.53G -> 2.58G
time: 4.91 s ± 24.5 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
DuckDB
Note that I first converted the df to an Arrow.Table before running the query so DuckDB could directly act on it. Also, I'm not sure if the conversion of the result back to Arrow takes up a huge amount of computation and is unfair to it.
preceding = 2
following = 2
query = f"""
SELECT
body,
list(body) OVER (
ROWS BETWEEN {preceding} PRECEDING
AND {following} FOLLOWING
) as combined
FROM df_table
QUALIFY regexp_matches(body, '{regex}')
"""
result = con.execute(query).arrow()
With DuckDB, my first attempt to run the computation crashed. I had to retry by reading to an Arrow Table directly without using Polars (this saved about 1GB of memory) to give DuckDB more memory to use.
first try:
cpu: single-core
memory: 2.53G -> 6.93G -> crash!
time: NA
second try:
cpu: single-core
memory: 1.62G -> 6.13G
time: 38.6 s ± 311 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)
A modification of Use the rolling function of polars to get a list of all values in the rolling windows
>>> (
... df
... .with_columns(
... [pl.col("values").shift(i).alias(f"lag_{i}") for i in range(-2, 3)])
... .filter(pl.col("letters").str.contains("D|F"))
... .select([
... pl.col("letters"),
... pl.concat_list(reversed([f"lag_{i}" for i in range(-2, 3)])).alias("output")
... ])
... )
shape: (2, 2)
┌─────────┬────────────────────────────────┐
│ letters | output │
│ --- | --- │
│ str | list[str] │
╞═════════╪════════════════════════════════╡
│ D | ["bb", "cc", "dd", "ee", "ff"] │
├─────────┼────────────────────────────────┤
│ F | ["dd", "ee", "ff", "gg", "hh"] │
└─//──────┴─//─────────────────────────────┘
You can try this:
preceding = 2
following = 2
look_around = [pl.col("values").shift(-i)
for i in range(-preceding, following + 1)]
(
df
.with_column(
pl.when(pl.col('letters').str.contains('D|F'))
.then(pl.concat_list(look_around))
.alias('combined')
)
.filter(pl.col('combined').is_not_null())
)
shape: (2, 3)
┌─────────┬────────┬────────────────────────┐
│ letters ┆ values ┆ combined │
│ --- ┆ --- ┆ --- │
│ str ┆ str ┆ list[str] │
╞═════════╪════════╪════════════════════════╡
│ D ┆ dd ┆ ["bb", "cc", ... "ff"] │
├╌╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌┼╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌╌┤
│ F ┆ ff ┆ ["dd", "ee", ... "hh"] │
└─────────┴────────┴────────────────────────┘
Loving the Polars library for its fantastic speed and easy syntax!
Struggling with this question - is there an analogue in Polars for the Pandas code below? Would like to replace strings using a dictionary.
Tried using this expression, but it returns 'TypeError: 'dict' object is not callable'
pl.col("List").str.replace_all(lambda key: key,dict())
Trying to replace the Working Pandas code below with a Polars expression
df = pd.DataFrame({'List':[
'Systems',
'Software',
'Cleared'
]})
dic = {
'Systems':'Sys'
,'Software':'Soft'
,'Cleared':'Clr'
}
df["List"] = df["List"].replace(dic, regex=True)
Output:
List
0 Sys
1 Soft
2 Clr
You could build an expression by chaining multiple .replace_all() calls.
>>> replacements = pl.col("List")
>>> for old, new in dic.items():
... replacements = replacements.str.replace_all(old, new)
>>> df.select(replacements)
shape: (3, 1)
┌──────┐
│ List │
│ --- │
│ str │
╞══════╡
│ Sys │
├╌╌╌╌╌╌┤
│ Soft │
├╌╌╌╌╌╌┤
│ Clr │
└──────┘
You can pass literal=True to .replace_all() if you don't need/want regex matching.
I think your best bet would be to turn your dic into a dataframe and join the two.
You need to convert your dic to the format which will make a nice DataFrame. You can do that as a list of dicts so that you have
dicdf=pl.DataFrame([{'List':x, 'newList':y} for x,y in dic.items()])
where List is what your column name is and we're arbitrary making newList our new column name that we'll get rid of later
You'll want to join that with your original df and then select all columns except the old List plus newList but renamed to List
df=df.join(
dicdf,
on='List') \
.select([
pl.exclude(['List','newList']),
pl.col('newList').alias('List')
])
Is there a way to allow an expression in Polars to refer to a previous aliased expression? For example, this code that defines two new columns errors because the second new column refers to the first:
import polars as pl
df = pl.DataFrame(dict(x=[0, 0, 1]))
df.select([
(pl.col('x') + 1).alias('y'),
(pl.col('y') * 2).alias('z')],
)
# pyo3_runtime.PanicException: called `Result::unwrap()` on an `Err` value:
# NotFound("Unable to get field named \"y\". Valid fields: [\"x\"]")
The error makes it obvious that the failure is caused by the first alias not being visible to the second expression. Is there a straightforward way to make this work?
All polars expressions within a context are executed in parallel. So they cannot refer to a column that does not yet exist.
A context is:
df.with_columns
df.select
df.groupby(..).agg
This means you need to enforce sequential execution for expressions that reference to other expression outputs.
In your case I would do:
(df.with_column(
(pl.col('x') + 1).alias('y')
).select([
pl.col('y'),
(pl.col('y') * 2).alias('z')
]))
One workaround is to pull out each new column into its own with_column call and then do a final select to keep the columns you were supposed to keep. You will probably want to make sure this is done lazily.
import polars as pl
df = pl.DataFrame(dict(x=[0, 0, 1]))
(df
.lazy()
.with_column((pl.col("x") + 1).alias("y"))
.with_column((pl.col("y") * 2).alias("z"))
.select(["y", "z"])
.collect()
)
# shape: (3, 2)
# ┌─────┬─────┐
# │ y ┆ z │
# │ --- ┆ --- │
# │ i64 ┆ i64 │
# ╞═════╪═════╡
# │ 1 ┆ 2 │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 1 ┆ 2 │
# ├╌╌╌╌╌┼╌╌╌╌╌┤
# │ 2 ┆ 4 │
# └─────┴─────┘
tab:
num │ value_two │ value_three │ value_four
─────┼───────────┼─────────────┼────────────
1 │ a │ A │ 4.0
2 │ a │ A2 │ 75.0
3 │ b │ A3 │ 7.0
I want to create a 2D json array like this
[[1,"a","A",4.0],[2,"a","A2",75.0],[3,"b","A3",7.0]]
I have tried two things:
First SELECT json_agg(tab) FROM tab but it returns an array of objects.
The second thing that I tried kinda works, the only detail is that it returns a 2d string array.
SELECT json_agg(ARRAY[num::TEXT,value_two,value_three,value_four::TEXT]) FROM tab
[["1","a","A","4.0"],["2","a","A2",75.0],["3","b","A3","7.0"]]
Short answer:
=# select json_agg(json_build_array(num, value_two, value_three, value_four)) as answer
from tab;
answer
-----------------------------------------------------------------
[[1, "a", "A", 4.0], [2, "a", "A2", 75.0], [3, "b", "A3", 7.0]]
(1 row)
Native PostgreSQL arrays like the one you created with
ARRAY[num::TEXT,value_two,value_three,value_four::TEXT]
are strictly typed, which is why you had to cast num and value_four to text.
To get the type mixing allowed in JSON, use json_build_array(), instead.