Delete files from all subdirectories keeping folder structure except for one subdirectory - powershell

I'm trying to remove all files from all subdirectories, while keep the structure of the folders, but excluding the removal of files from the month of October 2019 in each subdirectory ⇒ root_dir\*\2019\October\
The directory structure looks like this:
C:\Users\User1\Documents\MyFolder
├───Directory1
│ ├───2018
│ │ ├───April
│ │ │ └───Error
│ │ ├───August
│ │ │ └───Error
│ │ ├───February
│ │ │ └───Error
│ │ ├───January
│ │ │ └───Error
│ │ ├───July
│ │ │ └───Error
│ │ ├───June
│ │ │ └───Error
│ │ ├───March
│ │ │ └───Error
│ │ ├───May
│ │ │ └───Error
│ │ ├───October
│ │ │ └───Error
│ │ └───September
│ │ └───Error
│ └───2019
│ ├───April
│ │ └───Error
│ ├───August
│ │ └───Error
│ ├───February
│ │ └───Error
│ ├───January
│ │ └───Error
│ ├───July
│ │ └───Error
│ ├───June
│ │ └───Error
│ ├───March
│ │ └───Error
│ ├───May
│ │ └───Error
│ ├───October
│ │ └───Error
│ └───September
│ └───Error
└───Directory2
├───2018
│ ├───April
│ │ └───Error
│ ├───August
│ │ └───Error
│ ├───February
│ │ └───Error
│ ├───January
│ │ └───Error
│ ├───July
│ │ └───Error
│ ├───June
│ │ └───Error
│ ├───March
│ │ └───Error
│ ├───May
│ │ └───Error
│ ├───October
│ │ └───Error
│ └───September
│ └───Error
└───2019
├───April
│ └───Error
├───August
│ └───Error
├───February
│ └───Error
├───January
│ └───Error
├───July
│ └───Error
├───June
│ └───Error
├───March
│ └───Error
├───May
│ └───Error
├───October
│ └───Error
└───September
└───Error
From Microsoft PowerShell docs I should be able to wildcard the exclude path. I've tried a few variations but here is where I'm currently at:
$root = 'C:\Users\User1\Documents\MyFolder'
$excludes = 'C:\Users\User1\Documents\MyFolder\*\2019\October\'
Get-ChildItem $root -Directory -Exclude $excludes | ForEach-Object {
Get-ChildItem $_.FullName -File -Recurse -Force | Remove-Item -Force
}
The above works except its still removing files from C:\Users\User1\Documents\MyFolder\Directory1\2019\October\* and C:\Users\User1\Documents\MyFolder\Directory2\2019\October\*
I've tried specifying .\2019\October\*.* but that doesn't seem to work either.

I suggest a different approach:
$root = 'C:\Users\User1\Documents\MyFolder'
$excludes = '*\2019\October\*'
Get-ChildItem $root -File -Recurse |
Where-Object FullName -notlike $excludes |
Remove-Item -Force -WhatIf
-WhatIf previews the removal operation; remove it to perform actual removal.
For simplicity, all files in the subtree are enumerated, and then filtered out by whether their full paths contain \2019\October\ path components.
As for what you tried:
$excludes = 'C:\Users\User1\Documents\MyFolder\*\2019\October\'
The -Exclude parameter only supports file name patterns, not full paths - though there is a pending feature request on GitHub to add support for paths.
Also, using Get-ChildItem -Exclude without -Recurse doesn't work the way one would expect: the exclusion is then applied to the input path only - see this GitHub issue.

Related

Baes on Index (or column value), update values in a multiple column in Polars

Based on the index, I want to update 3 columns in a data frame. In pandas, I would do the following:
'''
index = 5
df.loc[index, ['a', 'b', 'c']] = var1, var2, var3
'''
What is the polras equivalent notation?
One possible approach is to use when/then/otherwise combined with .map()
df = pl.DataFrame({
"a": [1, 2, 3],
"b": [4, 5, 6],
"c": [7, 8, 9],
"d": [10, 11, 12]
})
index = 1
df.with_row_count().with_columns(
pl.when(pl.col("row_nr") == index)
.then(pl.all().map(lambda column:
{
"a": 100,
"b": 200,
"c": 300
}.get(column.name, column)))
.otherwise(pl.all())
)
shape: (3, 5)
┌────────┬─────┬─────┬─────┬─────┐
│ row_nr | a | b | c | d │
│ --- | --- | --- | --- | --- │
│ u32 | i64 | i64 | i64 | i64 │
╞════════╪═════╪═════╪═════╪═════╡
│ 0 | 1 | 4 | 7 | 10 │
├────────┼─────┼─────┼─────┼─────┤
│ 1 | 100 | 200 | 300 | 11 │
├────────┼─────┼─────┼─────┼─────┤
│ 2 | 3 | 6 | 9 | 12 │
└────────┴─────┴─────┴─────┴─────┘
Another is .set_at_idx()
df.with_columns([
df["a"].set_at_idx(index, 100),
df["b"].set_at_idx(index, 200),
df["c"].set_at_idx(index, 300)
])
shape: (3, 4)
┌─────┬─────┬─────┬─────┐
│ a | b | c | d │
│ --- | --- | --- | --- │
│ i64 | i64 | i64 | i64 │
╞═════╪═════╪═════╪═════╡
│ 1 | 4 | 7 | 10 │
├─────┼─────┼─────┼─────┤
│ 100 | 200 | 300 | 11 │
├─────┼─────┼─────┼─────┤
│ 3 | 6 | 9 | 12 │
└─────┴─────┴─────┴─────┘

Redshift SQL: How to get today's count and sum of counts from previous 3 days

I have a table with a date and some count like the following:
| Date | Count |
| 2019-01-02 | 100 |
| 2019-01-03 | 101 |
| 2019-01-04 | 99 |
| 2019-01-05 | 95 |
| 2019-01-06 | 90 |
| 2019-01-07 | 88 |
Given this table, what I want to compute is to sum the counts for the previous 3 days for each date like the followings:
| Date | Prev3DaysCount |
| 2019-01-02 | 0 |
| 2019-01-03 | 100 |
| 2019-01-04 | 201 |
| 2019-01-05 | 300 |
| 2019-01-06 | 295 |
| 2019-01-07 | 284 |
For example, the Prev3DaysCount of 284 for 2019-01-07 is from previous 3 days of (99+95+90). I figured that I can use SUM window function but I couldn't figure out how to limit the window to previous 3 days.
You can use a window function (along with a COALESCE to transform the null (in the first row) to 0):
SELECT
day,
COALESCE(
SUM(count) OVER (ORDER BY day ROWS BETWEEN 3 PRECEDING AND 1 PRECEDING),
0
) AS Prev3DaysCount
FROM t;
Returns:
┌────────────┬────────────────┐
│ day │ prev3dayscount │
├────────────┼────────────────┤
│ 2019-01-02 │ 0 │
│ 2019-01-03 │ 100 │
│ 2019-01-04 │ 201 │
│ 2019-01-05 │ 300 │
│ 2019-01-06 │ 295 │
│ 2019-01-07 │ 284 │
└────────────┴────────────────┘
(5 rows)

Finding and marking maximum row in a Spark SQL Window [duplicate]

This question already has answers here:
How to select the first row of each group?
(9 answers)
Closed 4 years ago.
Given the following DataFrame in Spark
+-----+------+---------+----+---------+----+----+------+ │
|empno| ename| job| mgr| hiredate| sal|comm|deptno| │
+-----+------+---------+----+---------+----+----+------+ │
| 7369| SMITH| CLERK|7902|17-Dec-80| 800| 20| 10| │
| 7499| ALLEN| SALESMAN|7698|20-Feb-81|1600| 300| 30| │
| 7521| WARD| SALESMAN|7698|22-Feb-81|1250| 500| 30| │
| 7566| JONES| MANAGER|7839| 2-Apr-81|2975| 0| 20| │
| 7654|MARTIN| SALESMAN|7698|28-Sep-81|1250|1400| 30| │
| 7698| BLAKE| MANAGER|7839| 1-May-81|2850| 0| 30| │
| 7782| CLARK| MANAGER|7839| 9-Jun-81|2450| 0| 10| │
| 7788| SCOTT| ANALYST|7566|19-Apr-87|3000| 0| 20| │
| 7839| KING|PRESIDENT| 0|17-Nov-81|5000| 0| 10| │
| 7844|TURNER| SALESMAN|7698| 8-Sep-81|1500| 0| 30| │
| 7876| ADAMS| CLERK|7788|23-May-87|1100| 0| 20| │
+-----+------+---------+----+---------+----+----+------+
I would like to create a new column mvp which is true if the row is the employee with the highest salary (sal) in the department (deptno), or false otherwise. I have attempted this using Window as shown below
val depWin = Window.partitionBy("depno")
df.withColumn("mvp", max("sal").over(depWin))
however, this only adds the salary of the highest paid employee in the same department to each row. How can I create this column denoting the highest paid in the department?
You can do this with orderBy on your Window and row_number
val depWin = Window.partitionBy("deptno").orderBy($"sal".desc)
val ranked = df.withColumn("rank", row_number.over(depWin))
ranked.withColumn("mvp", ranked("rank") === 1).drop("rank")

Select until row matches in Postgres

Given the following data structure:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
1 | 1 | error | 2015-06-30 15:20:03.041045 | f
2 | 1 | error | 2015-06-30 15:20:04.582907 | f
3 | 1 | sent | 2015-06-30 22:50:04.50478 | f
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
I want to retrieve the last messages with state='error' until the state contains something else.
It should return this:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
Following this question and later this one, I ended up with this query below:
SELECT * from (select id, subscription_id, state, created_at,
bool_and(state='error')
OVER (PARTITION BY state order by created_at, id) AS ok
FROM messages ORDER by created_at) m2
WHERE subscription_id = 1;
However, given that I added PARTITION BY state the query is simply ignoring all state which does not contain error and showing this instead:
id | subscription_id | state | created_at | ok
---------+-----------------+-------+----------------------------+----
1 | 1 | error | 2015-06-30 15:20:03.041045 | f
2 | 1 | error | 2015-06-30 15:20:04.582907 | f
4 | 1 | error | 2015-06-30 22:50:06.067279 | f
5 | 1 | error | 2015-07-01 22:50:02.356113 | f
How should the query be made in order to 'stop' after finding a different state and matching following the example described on the top only the ids 4 and 5?
If I correctly understand, you need this:
select * from messages
where
id > (select coalesce(max(id), 0) from messages where state <> 'error')
and
subscription_id = 1
Assuming that id is unique (PK ?) column and higher id means latest record.
EDIT
Thats correct, as #Marth mentioned, probably you need add ... AND subscription_id = 1 in subquery
No need to PARTITION BY state, you want to SELECT rows where all rows afterward (in the created_at ASC order) are error, ie bool_and(state = 'error') is true:
SELECT * FROM (
SELECT *,
bool_and(state = 'error') OVER (ORDER BY created_at DESC, id) AS only_errors_afterward
FROM sub
) s
WHERE only_errors_afterward
;
┌────┬─────────────────┬───────┬───────────────────────────────┬────┬───────────────────────┐
│ id │ subscription_id │ state │ created_at │ ok │ only_errors_afterward │
├────┼─────────────────┼───────┼───────────────────────────────┼────┼───────────────────────┤
│ 5 │ 1 │ error │ 2015-07-01 22:50:02.356113+02 │ f │ t │
│ 4 │ 1 │ error │ 2015-06-30 22:50:06.067279+02 │ f │ t │
└────┴─────────────────┴───────┴───────────────────────────────┴────┴───────────────────────┘
(2 rows)
Edit: Depending on the expected result you might need a PARTITION BY subscription_id in the window function.

Merge two columns based on a table value

I'm trying to merge french strings (language ID 1) into one column. So far, I'm able to get french strings in table1.title and table2.translated_topic, but am not sure how to concatenate them.
Ver: Postgres 9.6.0
Source table schemas:
Table 1: knowledgebase_topics
id | title | language_id |
------------------------------------
64 | The Topic | 91 |
65 | The Topic 2 | 91 |
62 | Le fav sujet | 1 |
63 | Le fav sujet 2 | 1 |
61 | le bonjour | 1 |
Table 2: knowledgebase_topics_translations
id | translated_topic| knowledgebase_topic_id | language_id |
-------------------------------------------------------------
| Le sujet | 64 | 1 |
| Le sujet 2 | 65 | 1 |
| Fav The Topic | 62 | 91 |
| Fav The Topic 2 | 63 | 91 |
Given the following Query:
SELECT title, translated_topic, "kbt".language_id, "kbtt".language_id
FROM knowledgebase_topics as "kbt"
LEFT JOIN knowledgebase_topics_translations as "kbtt" on ("kbtt".knowledgebase_topic_id = "kbt".id)
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
OR to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbt".language_id = 1
OR "kbtt".language_id = 1;
I get the following results:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
The Topic | Le sujet | 91 | 1
The Topic 2 | Le sujet 2 | 91 | 1
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
Desired results: table1.title and table2.translated_topics have been merged based on language_id == 1. Both tables have a language ID column.
title | language_id
----------------+--------------
Le sujet | 1
Le sujet 2 | 1
Le fav sujet | 1
Le fav sujet 2 | 1
le bonjour | 1
How can I do this?
Note: I do not simply want to check lang IDs = 1, such as
and "kbt".language_id = 1 AND (instead of OR) "kbtt".language_id = 1;
Because this results in 2 missing records from table 2 of language ID 1:
title | translated_topic | language_id | language_id
----------------+------------------+-------------+-------------
Le fav sujet | Fav The Topic | 1 | 91
Le fav sujet 2 | Fav The Topic 2 | 1 | 91
le bonjour | | 1 |
So, I've got it working... but is this performant?
SELECT title, "kbt".language_id
FROM knowledgebase_topics as "kbt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbt".title) ## to_tsquery('le')
AND "kbt".language_id = 1
UNION ALL
SELECT translated_topic, "kbtt".language_id
FROM knowledgebase_topics_translations as "kbtt"
INNER JOIN knowledgebase_topics_organizations as "kbto" on ("kbto".knowledgebase_topic_id = "kbtt".id)
WHERE "kbto"."organization_id" = 1
AND to_tsvector("kbtt".translated_topic) ## to_tsquery('le')
AND "kbtt".language_id = 1;
Gives output:
title | language_id
----------------+-------------
le bonjour | 1
Le fav sujet | 1
Le fav sujet 2 | 1
Le sujet | 1
Le sujet 2 | 1
(5 rows)
Setting up an environment to answer the question
First, observe how we best describe the problem with concise DDL. Preferably in the future, you'll learn how to write questions like this..
CREATE TEMPORARY TABLE knowledgebase_topics AS
SELECT * FROM ( VALUES
(64,'The Topic',91),
(65,'The Topic 2',91),
(62,'Le fav sujet',1),
(63,'Le fav sujet 2',1),
(61,'le bonjour',1)
) AS t(knowledgebase_topic_id, title, language_id);
CREATE TEMPORARY TABLE knowledgebase_topics_translations AS
SELECT * FROM ( VALUES
('Le sujet' ,64,1 ),
('Le sujet 2' ,65,1 ),
('Fav The Topic' ,62,91 ),
('Fav The Topic 2',63,91 )
) AS t(translated_topic, knowledgebase_topic_id, language_id);
Then you need only tell us what you want and we can get a working environment up easily and answer your question. No English required! Easier on both of us.
The solution
Here we use a UNION ALL we wrap that in a SELECT so we can sort by id, and easily change in one place the language that you're looking for.
SELECT title, language_id
FROM (
SELECT knowledgebase_topic_id, title, language_id
FROM knowledgebase_topics
UNION ALL
SELECT knowledgebase_topic_id, translated_topic, language_id
FROM knowledgebase_topics_translations
) AS t(id, title, language_id)
WHERE language_id = 1
ORDER BY id;
Output
title │ language_id
────────────────┼─────────────
le bonjour │ 1
Le fav sujet │ 1
Le fav sujet 2 │ 1
Le sujet │ 1
Le sujet 2 │ 1
(5 rows)