Extract words from filename delineated by underscores and spaces in Powershell - powershell

I am trying to extract two words from filenames. The names have the format:
__XXXXXXXX_XXX_XXXXXXX_XXXX_XXXXX_XXXX XXX_Aircraft 017_XXXXXXXX-XXXXXXX_XXXXXXX-XXXXXXX-XXXXXX-01Apr2021-XXXXX
With the X's being replaced with different words. I need to extract the aircraft number and the date so that I can rename the files with just that information. Using help from this site I have tried the following to isolate the aircraft number:
$names = gci -Path "H:\Path\to\Logs" *.log -Recurse | select #{n="Name"; e={if ($_.Name -match "Aircraft (\w+)") {
$matches[1] }}}
However, it doesn't seem to give me the match I need. However, I am very inexpert in programming and may be going down the wrong path. My hope is that the same logic used to isolate the aircraft number also applies for the date.

# Create a sample file.
$file = New-Item '__XXXXXXXX_XXX_XXXXXXX_XXXX_XXXXX_XXXX XXX_Aircraft 017_XXXXXXXX-XXXXXXX_XXXXXXX-XXXXXXX-XXXXXX-01Apr2021-XXXXX'
# Substitute your `Get-ChildItem` command for $file
$file |
Rename-Item -WhatIf -NewName {
if ($_.Name -match '_(Aircraft \w+?)_.+(\d{2}[a-z]{3}\d{4})-') {
# Synthesize the new file name from the extracted substrings.
'{0} - {1}' -f $Matches[1], $Matches[2]
} else {
# Input file name didn't match, (effectively) do nothing.
$_.Name
}
}
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
For an explanation of the regex used with the -match operator above, see this regex101.com page.[1]
The above uses two capture groups ((...)) to capture the substrings of interest, which can be accessed via indices 1 and 2 of the automatic $Matches variable.
-f, the format operator is then used to build the output file name from the captured substrings. Tweak the LHS format string as needed.
Thanks to -WhatIf, you'll see output such as the following, which is the preview of what would happen when you remove -WhatIf - note the new file name in the Destination: path:
What if: Performing the operation "Rename File" on target
"Item: /tmp/__XXXXXXXX_XXX_XXXXXXX_XXXX_XXXXX_XXXX XXX_Aircraft 017_XXXXXXXX-XXXXXXX_XXXXXXX-XXXXXXX-XXXXXX-01Apr2021-XXXXX
Destination: /tmp/Aircraft 017 - 01Apr2021".
Note how a script block ({ ... }) is passed as an argument to Rename-Item's -NewName parameter, which then acts on each input file via the automatic automatic $_ variable and outputs the argument value to use for the input object at hand. Such script blocks are called delay-bind script blocks.
[1] Note that even though regex101.com, a site for visualizing, explaining and experimenting with regexes, doesn't support the .NET regex engine used by PowerShell, choosing a similar engine, such as Java's, usually exhibits the same behavior, at least fundamentally.

Related

Script returning error: "Get-Content : An object at the specified path ... does not exist, or has been filtered by the -Include or -Exclude parameter

EDIT
I think I now know what the issue is - The copy numbers are not REALLY part of the filename. Therefore, when the array pulls it and then is used to get the match info, the file as it is in the array does not exist, only the file name with no copy number.
I tried writing a rename script but the same issue exists... only the few files I manually renamed (so they don't contain copy numbers) were renamed (successfully) by the script. All others are shown not to exist.
How can I get around this? I really do not want to manually work with 23000+ files. I am drawing a blank..
HELP PLEASE
I am trying to narrow down a folder full of emails (copies) with the same name "SCADA Alert.eml", "SCADA Alert[1].eml"...[23110], based on contents. And delete the emails from the folder that meet specific content criteria.
When I run it I keep getting the error in the subject line above. It only sees the first file and the rest it says do not exist...
The script reads through the folder, creates an array of names (does this correctly).
Then creates an variable, $email, and assigns the content of that file. for each $filename in the array.
(this is where is breaks)
Then is should match the specific string I am looking for to the content of the $email var and return true or false. If true I want it to remove the email, $filename, from the folder.
Thus narrowing down the email I have to review.
Any help here would be greatly appreciated.
This is what I have so far... (Folder is in the root of C:)
$array = Get-ChildItem -name -Path $FolderToRead #| Get-Content | Tee C:\Users\baudet\desktop\TargetFile.txt
Foreach ($FileName in $array){
$FileName # Check File
$email = Get-Content $FolderToRead\$FileName
$email # Check Content
$ContainsString = "False" # Set Var
$ContainsString # Verify Var
$ContainsString = %{$email -match "SYS$,ROC"} # Look for String
$ContainsString # Verify result of match
#if ($ContainsString -eq "True") {
#Remove-Item $FolderToRead\$element
#}
}
Here's a PowerShell-idiomatic solution that also resolves your original problems:
Get-ChildItem -File -LiteralPath $FolderToRead | Where-Object {
(Get-Content -Raw -LiteralPath $_.FullName) -match 'SYS\$,ROC'
} | Remove-Item -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note how the $ character in the RHS regex of the -match operator is \-escaped in order to use it verbatim (rather than as metacharacter $, the end-of-input anchor).
Also, given that $ is also used in PowerShell's string interpolation, it's better to use '...' strings (single-quoted, verbatim strings) to represent regexes, assuming no actual up-front string expansion is needed before the regex engine sees the resulting string - see this answer for more information.
As for what you tried:
The error message stemmed from the fact that Get-Content $FolderToRead\$FileName binds the file-name argument, $FolderToRead\$FileName, implicitly (positionally) to Get-Content's -Path parameter, which expects PowerShell wildcard patterns.
Since your file names literally contain [ and ] characters, they are misinterpreted by the (implied) -Path parameter, which can be avoided by using the -LiteralPath parameter instead (which must be specified explicitly, as a named argument).
%{$email -match "SYS$,ROC"} is unnecessarily wrapped in a ForEach-Object call (% is a built-in alias); while that doesn't do any harm in this case, it adds unnecessary overhead;
$email -match "SYS$,ROC" is enough, though it needs to be corrected to
$email -match 'SYS\$,ROC', as explained above.
[System.IO.Directory]::EnumerateFiles($Folder) |
Where-Object {$true -eq [System.IO.File]::ReadAllText($_, [System.Text.Encoding]::UTF8).Contains('SYS$,ROC') } |
ForEach-Object {
Write-Host "Removing $($_)"
#[System.IO.File]::Delete($_)
}
Your mistakes:
%{$email -match "SYS$,ROC"} - What % is intended to be? This is ForEach-Object alias.
%{$email -match "SYS$,ROC"} - Why use -match? This is much slower than -like or String.Contains()
%{$email -match "SYS$,ROC"} - When using $ inside double quotes, you should escape this using single backtick symbol (I have `$100). Otherwise, everything after $ is variable name: Hello, $username; I's $($weather.ToString()) today!
Write debug output in a right way: use Write-Debug, Write-Verbose, Write-Host, Write-Warning, Write-Error, Write-Information.
Can be better:
Avoid using Get-ChildItem, because Get-ChildItem returns files with attributes (like mtime, atime, ctime, etc). This additional info is additional request per file. When you need only list of files, use native .Net EnumerateFiles from System.IO.Directory. This is significant performace boost on huge amounts of files.
Use RealAllText or ReadAllLines or ReadAllBytes from System.IO.File static class to be more concrete instead of using universal Get-Content.
Use pipelines ;-)

Read a file and then based on file content names, move the file from one folder to another using Powershell script

I need to read a file (e.g. file.txt) which has file names as its content. File names are separated by unique character (e.g. '#'). So my file.txt looks something like:
ABC.txt#
CDE.csv#
XYZ.txt#
I need to read its content line by line based on its extension. I have 1 source folder and 1 destination folder. Below is my scenario that I need to achieve:
If extension = txt then
check if that file name exists in destination_folder1 or destination_folder2
if that file exists then
copy that file from source_folder1 to destination_folder1
else delete that file from destination_folder1
Else display msg as "Invalid file"
I am new to powershell scripting. can someone pls help? Thanks in advance.
It will make my job easier if we assume the following pseudocode. Then you can take the elements I demonstrate and change them to fit your needs.
If the string from "file.txt" contains the file extension "txt" then continue.
If the file does not exist in the destination folder then copy the file from the source folder to the destination folder.
Use Get-Content to read a text file.
Get-Content .\file.txt
Get-Content processes files line by line. This has a few consequences:
Each line in our input text file will trigger our code.
Each time our code triggers, it will have input that looks like this: ABC.txt#
We can focus on solving the problem for one line.
If we need to evaluate strings, I suggest using regular expressions.
Remember, we are operating on a single line from the text file:
ABC.txt#
We need to detect the file extension.
A good place to start would be the end of the string.
In regular expressions, the end of a string is represented by $
So let's start there.
Here is our regular expression so far:
$
The next thing that would be useful is if we accounted for that # symbol. We can do that by adding it before $
#$
If there was a different character, we would add that instead: ;$ Keep in mind that there are reserved characters in regular expressions. So we might need to escape certain characters with a backslash: \$$
Now we have to account for the file extension.
We have three letters, we don't know what they are.
Regular expressions have a special escape sequence (called a character class) that can match any letter: \w
Let's add three of those.
\w\w\w#$
Now, while crafting regular expressions, it is a good idea to limit the text we're looking for.
As humans, we know we're looking for .txt# But, so far, the computer only knows about txt# with no dot. So it would accept .txt#, .xlsx#, and anythingGoes# as matches. We limited the right side of our string. Now let's limit the left side.
We're only interested in three characters. And the left side is bounded by a . So let's add that to our regular expression. I'll also mention that a period is a reserved character in regular expressions. So, we will have to escape it.
\.\w\w\w#$
So if we're looking at text like this
ABC.txt#
then our regular expression will output text like this
.txt#
Now, .txt# is a pretty good result. But we can make our job a little easier by limiting the result to just the file extension.
There are several ways of doing this. But I suggest using regular expression groups.
We create a group by surrounding our target with parentheses.
\.(\w\w\w)#$
This now produces output like:
txt
From here, we can just make intuitive comparisons like if txt = txt.
Another piece of the puzzle is testing whether a file already exists. For this we can use the Test-Path and Join-Path cmdlets.
$destination = ".\destination 01"
$exampleFile = "ABC.txt"
$destinationFilePath = Join-Path -Path $destination -ChildPath $exampleFile
Test-Path -Path $destinationFilePath
With these concepts, it is possible to write a working example.
# Folder locations.
$source = ".\source"
$destination = ".\destination 01"
# Load input file.
Get-Content .\file.txt |
Where-Object {
# Enter our regular expression.
# I've added an extra group to capture the file name.
# The $matches automatic variable is created when the -match comparison operator is used.
if ($_ -match '([\w ]+\.(\w\w\w))#$')
{
# Which file extensions are we interested in processing?
# Here $matches[2] represents the file extension: ex "txt".
# We use a switch statement to handle each type of file extension.
# Accept new file types by creating new switch cases.
switch ($matches[2])
{
"txt" {$true; Break}
#"csv" {$true; Break}
#"pdf" {$true; Break}
default {$false}
}
}
else { $false }
} |
ForEach-Object {
# Here $matches[1] is the file name captured from the input file.
$sourceFilePath = Join-Path -Path $source -ChildPath $matches[1]
$destinationFilePath = Join-Path -Path $destination -ChildPath $matches[1]
$fileExists = Test-Path -Path $destinationFilePath
# Copy the source file to the destination if the destination doesn't exist.
if (!$fileExists)
{ Copy-Item -Path $sourceFilePath -Destination $destinationFilePath }
}
Note on Copy-Item
Copy-Item has known issues.
Issue #10458 | PowerShell | GitHub
Issue #2581 | PowerShell | GitHub
You can substitute robocopy which is more reliable.
Robocopy - Wikipedia
The robocopy syntax is:
robocopy <source> <destination> [<file>[ ...]] [<options>]
where <source> and <destination> can be folders only.
So, if you want to copy a file, you have to write it like this:
robocopy .\source ".\destination 01" ABC.txt
We can invoke robocopy using Start-Process and the variables we already have.
# Copy the source file to the destination if the destination doesn't exist.
if (!$fileExists)
{
Start-Process -FilePath "robocopy.exe" -ArgumentList "`"$source`" `"$destination`" `"$($matches[1])`" /unilog+:.\robolog.txt" -WorkingDirectory (Get-Location) -NoNewWindow
}
Using Get-ChildItem
You use file.txt as input. If you wanted to gather a list of files on disc, you can use Get-ChildItem.
Multiple Conditions
You wrote "destination_folder1 or destination_folder2". If you need multiple conditions you can construct this with three things.
Use the if statement. Inside the test condition, you can add multiple conditions with logical -or And you can group statements together to make them easier to read.
Functions
If you need to move a piece of code around, you can use a function. Just remember to create parameters for the inputs to the function. Then call a PowerShell function without parentheses or commas:
# Calling a PowerShell function.
myFunction parameterOne parameterTwo parameterThree
Writing Output
You can use Write-Output to send text to the console.
Write-Output "Invalid File"
Further Reading
Here are some references which you might find useful.
about_Comparison_Operators - PowerShell | Microsoft Docs
about_Pipelines - PowerShell | Microsoft Docs
about_Switch - PowerShell | Microsoft Docs
Regular-Expressions.info - Regex Tutorial, Examples and Reference - Regexp Patterns
Where-Object (Microsoft.PowerShell.Core) - PowerShell | Microsoft Docs

Replace text in files within a folder PowerShell

I have a folder that contains files like 'goodthing 2007adsdfff.pdf', 'betterthing 2007adfdsw.pdf', and 'bestthing_2007fdsfad.pdf', I want to be able to rename each, eliminating all text including 2007 OR _2007 to the end of the string keeping .pdf and getting this result: 'goodthing.pdf' 'betterthing.pdf' 'bestthing.pdf' I've tried this with the "_2007", but haven't figured out a conditional to also handle the "2007". Any advice on how to accomplish this is greatly appreciated.
Get-ChildItem 'C:Temp\' -Name -Filter *.pdf | foreach { $_.Split("_2017")[0].substring(0)}
Try the following:
Get-ChildItem 'C:\Temp' -Name -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '[_ ][^.]+' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
The above uses Rename-Item with a delay-bind script block and the -replace operator as follows:
Regex [_ ][^.]+ matches everything from the first space or _ char. (character set [ _]) through to the following literal . char. ([^.]+ matches one or more chars. other than (^) than .) - that is, everything from the first / _ through to the filename extension (excluding the .).
Note: To guard against file names such as _2017.pdf matching (which would result in just .pdf as the new name), use the following regex instead: '(?<=.)[_ ][^.]+'
By not providing a replacement operand to -replace, what is matched is replace with the empty string and therefore effectively removed.
The net effect is that input files named
'goodthing 2007adsdfff.pdf', 'betterthing 2007adfdsw.pdf', 'bestthing_2007fdsfad.pdf'
are renamed to
'goodthing.pdf', 'betterthing.pdf', 'bestthing.pdf'
Without knowing the names of all the potential files, I can offer this solution that is 100%:
PS> $flist = ("goodthing 2007adsdfff.pdf","betterthing 2007adfdsw.pdf","bestthing_2007fdsfad.pdf")
PS> foreach ($f in $flist) {$nicename = ($f -replace "([\w\s]+)2007.*(\.\w+)", '$1$2') -replace "[\s_].","." ;$nicename}
goodthing.pdf
betterthing.pdf
bestthing.pdf
Two challenges:
the underscore is actually part of the \w character class. So the alternative to the above is to complicate the regex or try to assume that there will always be only one '_' before the 2007. Both seemed risky to me.
if there are spaces in filenames, there is no telling if you might encounter more than one. This solution removes only the one right before 2007.
The magic:
The -replace operator enables you to quickly capture text in () and re-use it in variables like $1$2. If you have more complex captures, you just have to figure out the order they are assigned.
Hope this helps.

Removing Parts of a File Name based on a Delimiter

I have various file names that are categorized in two different ways. They either just have a code like: "866655" or contain a suffix and prefix "eu_866655_001". My hope is to write to a text file the names of files in the same format. I cannot figure out a successful method for removing the suffix and prefix.
Currently this what I have in my loop in Powershell:
$docs = Get-ChildItem -Path $source | Where-Object {$_.Name -match '.doc*'}
if ($docs.basename -contians 'eu_*')
{
Write-Output ([io.fileinfo]"$doc").basename.split("_")
}
I'm hoping to turn "eu_866655_001" into "866655" by using "_" as the delimiter.
I'm aware that the answer is staring me down but I still can't seem to figure it out.
You could do something like the following. Feel free to tweak the -Filter on the Get-ChildItem command.
$source = 'c:\path\*'
$docs = Get-ChildItem -Path $source -File -Filter "*_*_*" -Include '*.doc','*.docx'
$docs | Rename-Item -NewName { "{0}{1}" -f $_.Basename.Split('_')[1],$_.Extension }
The important things to remember is that in order to use the -Include switch, you need an * at the end of the -Path value.
Explanation:
-Filter allows us to filter on names that contain two underscores separating three substrings.
-Include allows us to only list files ending in extensions .docx and .doc.
Rename-Item -NewName supports delayed script binding. This allows us use a scriptblock to perform any necessary operations for each piped object (each file).
Since the target files will always have two underscores, the .Split('_') method will result in an three index array delimited by the _. You have specified that you always want the second delimited substring and that is represented by index 1 ([1]).
The format operator (-f) puts the substring and extension together, completing the file name.

How to replace first characters in a file name with a string?

I've been working on a script to maintain the archive from my IP camera DVR. My recording software outputs filenames formatted so that the first character is the camera number, followed by a date and time stamp.
ex. 1_2017-11-03_00-45-07.avi
I want to replace the first character with a string that represents the camera.
ex. DivertCam_2017-11-03_00-45-07.avi
So far, I have:
Get-ChildItem "D:\DivertCam\1_*.avi" |
Rename-Item -NewName {$_.Name -replace '1_?','DivertCam_'}
Luckily with -WhatIfand running a transcript, I was able to see that my results would be wrong:
What if: Performing the operation "Rename File" on target "Item: D:\DivertCam\1_2017-11-03_00-45-07.avi Destination: D:\DivertCam\DivertCam_20DivertCam_7-DivertCam_DivertCam_-03_00-45-07.avi"
I know it's just picking out every "1_". How can I make it after the the first instance of "1_", or read the filename like a string, split it into 3 arrays separated by "_" and then change the first array?
The -replace operator performs a RegEx match and replacement, so you can use RegEx syntax to do what you want. For you the solution is to include the 'beginning of string' characater ^ at the beginning of your match text. Since this is RegEx, the ? means the previous character may or may not exist, so what you are currently matching on is any character matching '1' which may or may not be followed by an underscore. A better version would simply be:
$_.name -replace '^1','DivertCam'
To put that in context with the rest of your line, it would be:
Get-ChildItem "D:\DivertCam\1_*.avi" | Rename-Item -NewName {$_.name -replace '^1','DivertCam'}
Keep in mind this only works for the -replace operator which uses RegEx (short for Regular Expression) matching, and not the .Replace() method that you may see used, which uses simple pattern matching.
This will replace everything before the first '_' with 'DivertCam' (note use of % (foreach) to operate on each file individually).
Get-ChildItem "D:\DivertCam\1_*.avi" | % {Rename-Item $_.FullName -NewName "DivertCam$($_.Name.Substring($_.Name.IndexOf('_')))" }