Baes on Index (or column value), update values in a multiple column in Polars - python-polars

Based on the index, I want to update 3 columns in a data frame. In pandas, I would do the following:
'''
index = 5
df.loc[index, ['a', 'b', 'c']] = var1, var2, var3
'''
What is the polras equivalent notation?

One possible approach is to use when/then/otherwise combined with .map()
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
"c": [7, 8, 9],
"d": [10, 11, 12]
})
index = 1
df.with_row_count().with_columns(
pl.when(pl.col("row_nr") == index)
.then(pl.all().map(lambda column:
{
"a": 100,
"b": 200,
"c": 300
}.get(column.name, column)))
.otherwise(pl.all())
)
shape: (3, 5)
┌────────┬─────┬─────┬─────┬─────┐
│ row_nr | a | b | c | d │
│ --- | --- | --- | --- | --- │
│ u32 | i64 | i64 | i64 | i64 │
╞════════╪═════╪═════╪═════╪═════╡
│ 0 | 1 | 4 | 7 | 10 │
├────────┼─────┼─────┼─────┼─────┤
│ 1 | 100 | 200 | 300 | 11 │
├────────┼─────┼─────┼─────┼─────┤
│ 2 | 3 | 6 | 9 | 12 │
└────────┴─────┴─────┴─────┴─────┘
Another is .set_at_idx()
df.with_columns([
df["a"].set_at_idx(index, 100),
df["b"].set_at_idx(index, 200),
df["c"].set_at_idx(index, 300)
])
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a | b | c | d │
│ --- | --- | --- | --- │
│ i64 | i64 | i64 | i64 │
╞═════╪═════╪═════╪═════╡
│ 1 | 4 | 7 | 10 │
├─────┼─────┼─────┼─────┤
│ 100 | 200 | 300 | 11 │
├─────┼─────┼─────┼─────┤
│ 3 | 6 | 9 | 12 │
└─────┴─────┴─────┴─────┘

Related

Redshift SQL: How to get today's count and sum of counts from previous 3 days

I have a table with a date and some count like the following:
| Date | Count |
| 2019-01-02 | 100 |
| 2019-01-03 | 101 |
| 2019-01-04 | 99 |
| 2019-01-05 | 95 |
| 2019-01-06 | 90 |
| 2019-01-07 | 88 |
Given this table, what I want to compute is to sum the counts for the previous 3 days for each date like the followings:
| Date | Prev3DaysCount |
| 2019-01-02 | 0 |
| 2019-01-03 | 100 |
| 2019-01-04 | 201 |
| 2019-01-05 | 300 |
| 2019-01-06 | 295 |
| 2019-01-07 | 284 |
For example, the Prev3DaysCount of 284 for 2019-01-07 is from previous 3 days of (99+95+90). I figured that I can use SUM window function but I couldn't figure out how to limit the window to previous 3 days.
You can use a window function (along with a COALESCE to transform the null (in the first row) to 0):
SELECT
day,
COALESCE(
SUM(count) OVER (ORDER BY day ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING),
0
) AS Prev3DaysCount
FROM t;
Returns:
┌────────────┬────────────────┐
│ day │ prev3dayscount │
├────────────┼────────────────┤
│ 2019-01-02 │ 0 │
│ 2019-01-03 │ 100 │
│ 2019-01-04 │ 201 │
│ 2019-01-05 │ 300 │
│ 2019-01-06 │ 295 │
│ 2019-01-07 │ 284 │
└────────────┴────────────────┘
(5 rows)

Delete files from all subdirectories keeping folder structure except for one subdirectory

I'm trying to remove all files from all subdirectories, while keep the structure of the folders, but excluding the removal of files from the month of October 2019 in each subdirectory ⇒ root_dir\*\2019\October\
The directory structure looks like this:
C:\Users\User1\Documents\MyFolder
├───Directory1
│ ├───2018
│ │ ├───April
│ │ │ └───Error
│ │ ├───August
│ │ │ └───Error
│ │ ├───February
│ │ │ └───Error
│ │ ├───January
│ │ │ └───Error
│ │ ├───July
│ │ │ └───Error
│ │ ├───June
│ │ │ └───Error
│ │ ├───March
│ │ │ └───Error
│ │ ├───May
│ │ │ └───Error
│ │ ├───October
│ │ │ └───Error
│ │ └───September
│ │ └───Error
│ └───2019
│ ├───April
│ │ └───Error
│ ├───August
│ │ └───Error
│ ├───February
│ │ └───Error
│ ├───January
│ │ └───Error
│ ├───July
│ │ └───Error
│ ├───June
│ │ └───Error
│ ├───March
│ │ └───Error
│ ├───May
│ │ └───Error
│ ├───October
│ │ └───Error
│ └───September
│ └───Error
└───Directory2
├───2018
│ ├───April
│ │ └───Error
│ ├───August
│ │ └───Error
│ ├───February
│ │ └───Error
│ ├───January
│ │ └───Error
│ ├───July
│ │ └───Error
│ ├───June
│ │ └───Error
│ ├───March
│ │ └───Error
│ ├───May
│ │ └───Error
│ ├───October
│ │ └───Error
│ └───September
│ └───Error
└───2019
├───April
│ └───Error
├───August
│ └───Error
├───February
│ └───Error
├───January
│ └───Error
├───July
│ └───Error
├───June
│ └───Error
├───March
│ └───Error
├───May
│ └───Error
├───October
│ └───Error
└───September
└───Error
From Microsoft PowerShell docs I should be able to wildcard the exclude path. I've tried a few variations but here is where I'm currently at:
$root = 'C:\Users\User1\Documents\MyFolder'
$excludes = 'C:\Users\User1\Documents\MyFolder\*\2019\October\'
Get-ChildItem $root -Directory -Exclude $excludes | ForEach-Object {
Get-ChildItem $_.FullName -File -Recurse -Force | Remove-Item -Force
}
The above works except its still removing files from C:\Users\User1\Documents\MyFolder\Directory1\2019\October\* and C:\Users\User1\Documents\MyFolder\Directory2\2019\October\*
I've tried specifying .\2019\October\*.* but that doesn't seem to work either.
I suggest a different approach:
$root = 'C:\Users\User1\Documents\MyFolder'
$excludes = '*\2019\October\*'
Get-ChildItem $root -File -Recurse |
Where-Object FullName -notlike $excludes |
Remove-Item -Force -WhatIf
-WhatIf previews the removal operation; remove it to perform actual removal.
For simplicity, all files in the subtree are enumerated, and then filtered out by whether their full paths contain \2019\October\ path components.
As for what you tried:
$excludes = 'C:\Users\User1\Documents\MyFolder\*\2019\October\'
The -Exclude parameter only supports file name patterns, not full paths - though there is a pending feature request on GitHub to add support for paths.
Also, using Get-ChildItem -Exclude without -Recurse doesn't work the way one would expect: the exclusion is then applied to the input path only - see this GitHub issue.

Recursive CTE - Get descendants (many-to-many relationship)

What I have:
Given a tree (or more like a directed graph) that describes how a system is composed by its generic parts. For now let this system be e.g. the human body and the nodes its body parts.
So for instance 3 could be the liver that has a left and a right lobe (6 and 9), in both of which there are veins (8) (that can also be found at any unspecified place of the liver, hence 8->3) but also in the tongue (5). The lung (7) - which is in the chest (4) - also has a right lobe, and so on... (Well, of course there is no lung in the liver and also a 6->7 would be reasonable so this example wasn't the best but you get it.)
So I have this data in a database like this:
table: part
+----+------------+ id is primary key
| id | name |
+----+------------+
| 1 | head |
| 2 | mouth |
| 3 | liver |
| 4 | chest |
| 5 | tongue |
| 6 | left lobe |
| 7 | lung |
| 8 | veins |
| 9 | right lobe |
+----+------------+
table: partpart
+-------+---------+ part&cont is primary key
| part | cont | part is foreign key for part.id
+-------+---------+ cont is foreign key for part.id
| 2 | 1 |
| 3 | 1 |
| 5 | 2 |
| 6 | 3 |
| 7 | 3 |
| 7 | 4 |
| 8 | 3 |
| 8 | 5 |
| 8 | 6 |
| 8 | 9 |
| 9 | 3 |
| 9 | 7 |
+-------+---------+
What I want to achieve:
I'd like to query all parts that can be found in part 3 and expecting a result like this one:
result of query
+-------+---------+
| part | subpart |
+-------+---------+
| 3 | 6 |
| 3 | 7 |
| 3 | 8 |
| 3 | 9 |
| 6 | 8 |
| 7 | 9 |
| 9 | 8 |
+-------+---------+
I have the feeling that getting the result in this desired format is not feasible, still it would be great to have it as a similar set because my purpose is to display the data for the user like that:
3
├─ 6
│ └─ 8
├─ 7
│ └─ 9
│ └─ 8
├─ 8
└─ 9
└─ 8
How I'm trying:
WITH RECURSIVE tree AS (
SELECT part.id as part, partpart.cont (..where to define subpart?)
FROM part JOIN partpart
ON part.id = partpart.part
WHERE part.id = 3
UNION ALL
SELECT part.id, partpart.cont
FROM (part JOIN partpart
ON part.id = partpart.part
), tree
WHERE partpart.cont = tree.part
)
SELECT part, subpart FROM tree
This is the closest I could do but of course it doesn't work.
Problem solved, here is the query I needed, I hope it once helps someone else too...
WITH RECURSIVE graph AS (
SELECT
p.id AS subpart,
pp.cont AS part
FROM part p JOIN partpart pp
ON p.id = pp.part
WHERE pp.cont = 3
UNION ALL
SELECT
part.id,
partpart.cont
FROM (part JOIN partpart
ON part.id = partpart.part
), graph WHERE partpart.cont = graph.subpart
)
SELECT part, subpart, FROM graph

Select until row matches in Postgres

Given the following data structure:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
1 | 1 | error | 2015-06-30 15:20:03.041045 | f
2 | 1 | error | 2015-06-30 15:20:04.582907 | f
3 | 1 | sent | 2015-06-30 22:50:04.50478 | f
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
I want to retrieve the last messages with state='error' until the state contains something else.
It should return this:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
Following this question and later this one, I ended up with this query below:
SELECT * from (select id, subscription_id, state, created_at,
bool_and(state='error')
OVER (PARTITION BY state order by created_at, id) AS ok
FROM messages ORDER by created_at) m2
WHERE subscription_id = 1;
However, given that I added PARTITION BY state the query is simply ignoring all state which does not contain error and showing this instead:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
1 | 1 | error | 2015-06-30 15:20:03.041045 | f
2 | 1 | error | 2015-06-30 15:20:04.582907 | f
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
How should the query be made in order to 'stop' after finding a different state and matching following the example described on the top only the ids 4 and 5?
If I correctly understand, you need this:
select * from messages
where
id > (select coalesce(max(id), 0) from messages where state <> 'error')
and
subscription_id = 1
Assuming that id is unique (PK ?) column and higher id means latest record.
EDIT
Thats correct, as #Marth mentioned, probably you need add ... AND subscription_id = 1 in subquery
No need to PARTITION BY state, you want to SELECT rows where all rows afterward (in the created_at ASC order) are error, ie bool_and(state = 'error') is true:
SELECT * FROM (
SELECT *,
bool_and(state = 'error') OVER (ORDER BY created_at DESC, id) AS only_errors_afterward
FROM sub
) s
WHERE only_errors_afterward
;
┌────┬─────────────────┬───────┬───────────────────────────────┬────┬───────────────────────┐
│ id │ subscription_id │ state │ created_at │ ok │ only_errors_afterward │
├────┼─────────────────┼───────┼───────────────────────────────┼────┼───────────────────────┤
│ 5 │ 1 │ error │ 2015-07-01 22:50:02.356113+02 │ f │ t │
│ 4 │ 1 │ error │ 2015-06-30 22:50:06.067279+02 │ f │ t │
└────┴─────────────────┴───────┴───────────────────────────────┴────┴───────────────────────┘
(2 rows)
Edit: Depending on the expected result you might need a PARTITION BY subscription_id in the window function.

Merge two columns based on a table value

I'm trying to merge french strings (language ID 1) into one column. So far, I'm able to get french strings in table1.title and table2.translated_topic, but am not sure how to concatenate them.
Ver: Postgres 9.6.0
Source table schemas:
Table 1: knowledgebase_topics
id | title | language_id |
------------------------------------
64 | The Topic | 91 |
65 | The Topic 2 | 91 |
62 | Le fav sujet | 1 |
63 | Le fav sujet 2 | 1 |
61 | le bonjour | 1 |
Table 2: knowledgebase_topics_translations
id | translated_topic| knowledgebase_topic_id | language_id |
-------------------------------------------------------------
| Le sujet | 64 | 1 |
| Le sujet 2 | 65 | 1 |
| Fav The Topic | 62 | 91 |
| Fav The Topic 2 | 63 | 91 |
Given the following Query:
SELECT title, translated_topic, "kbt".language_id, "kbtt".language_id
FROM knowledgebase_topics as "kbt"
LEFT JOIN knowledgebase_topics_translations as "kbtt" on ("kbtt".knowledgebase_topic_id = "kbt".id)
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
OR to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbt".language_id = 1
OR "kbtt".language_id = 1;
I get the following results:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
The Topic | Le sujet | 91 | 1
The Topic 2 | Le sujet 2 | 91 | 1
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
Desired results: table1.title and table2.translated_topics have been merged based on language_id == 1. Both tables have a language ID column.
title | language_id
----------------+--------------
Le sujet | 1
Le sujet 2 | 1
Le fav sujet | 1
Le fav sujet 2 | 1
le bonjour | 1
How can I do this?
Note: I do not simply want to check lang IDs = 1, such as
and "kbt".language_id = 1 AND (instead of OR) "kbtt".language_id = 1;
Because this results in 2 missing records from table 2 of language ID 1:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
So, I've got it working... but is this performant?
SELECT title, "kbt".language_id
FROM knowledgebase_topics as "kbt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
AND "kbt".language_id = 1
UNION ALL
SELECT translated_topic, "kbtt".language_id
FROM knowledgebase_topics_translations as "kbtt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbtt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbtt".language_id = 1;
Gives output:
title | language_id
----------------+-------------
le bonjour | 1
Le fav sujet | 1
Le fav sujet 2 | 1
Le sujet | 1
Le sujet 2 | 1
(5 rows)
Setting up an environment to answer the question
First, observe how we best describe the problem with concise DDL. Preferably in the future, you'll learn how to write questions like this..
CREATE TEMPORARY TABLE knowledgebase_topics AS
SELECT * FROM ( VALUES
(64,'The Topic',91),
(65,'The Topic 2',91),
(62,'Le fav sujet',1),
(63,'Le fav sujet 2',1),
(61,'le bonjour',1)
) AS t(knowledgebase_topic_id, title, language_id);
CREATE TEMPORARY TABLE knowledgebase_topics_translations AS
SELECT * FROM ( VALUES
('Le sujet' ,64,1 ),
('Le sujet 2' ,65,1 ),
('Fav The Topic' ,62,91 ),
('Fav The Topic 2',63,91 )
) AS t(translated_topic, knowledgebase_topic_id, language_id);
Then you need only tell us what you want and we can get a working environment up easily and answer your question. No English required! Easier on both of us.
The solution
Here we use a UNION ALL we wrap that in a SELECT so we can sort by id, and easily change in one place the language that you're looking for.
SELECT title, language_id
FROM (
SELECT knowledgebase_topic_id, title, language_id
FROM knowledgebase_topics
UNION ALL
SELECT knowledgebase_topic_id, translated_topic, language_id
FROM knowledgebase_topics_translations
) AS t(id, title, language_id)
WHERE language_id = 1
ORDER BY id;
Output
title │ language_id
────────────────┼─────────────
le bonjour │ 1
Le fav sujet │ 1
Le fav sujet 2 │ 1
Le sujet │ 1
Le sujet 2 │ 1
(5 rows)