How to replace any character different from others previously specified? - powershell

The following code removes the ! from all the .txt files.
get-childitem *.txt | ForEach {Move-Item -LiteralPath $_.name $_.name.Replace("!","")}
I need to do this not only for the ! character but also for #, ,, ~, among others. My intention is to get a code that has the following rule: any character other than [a-z] and also [0-9] must be removed from the file names.

Assuming you want to rename the files in place:
Get-Childitem *.txt | Rename-Item -WhatIf -NewName {
($_.BaseName -replace '[^\d\p{L}]') + $_.Extension
}
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
For renaming files in place, it is better to use Rename-Item than Move-Item.
Note that Rename-item is directly piped to, with the new file names getting dynamically calculated via a delay-bind script block.
Negated (^) character set ([...]) \d\p{L} matches all characters that are neither digits (\d) nor letters (p{L}).
Note:
p{L} also matches letters outside the ASCII range of Unicode characters, such as é, for instance; if you really want to limit matching to ASCII-range letters only, use a-z
Similarly, \d matches not just 0 through 9, but other Unicode characters that are considered digits (e.g., ৮ (BENGALI DIGIT EIGHT, U+09EE); to limit matching to 0 through 9, use 0-9.
PowerShell by default is case-insensitive, so if you wanted to match lowercase letters only, you'd have to use -creplace and p{Ll}.
By not specifying a replacement string, all matching characters are effectively removed.

Related

How to replace all files NAME in A folder with character "_" if the character ASCII encoding is greater than 128 use powershell

The example file name is
PO 2171 Cresco REVISED.pdf
.....
Many of these files, the file name is not standard, the space position is not fixed.
The middle space is characters ASCII code greater than 128, and I want to replace characters ASCII code greater than 128 with "_" one-time.
I haven't learned Powershell yet.
Thank you very much.
For this you are going to need regex.
Below I'm using the ASCII range 32 - 129 (in hex \x20-\x81) to also replace any control characters:
(Get-ChildItem -Path 'X:\TheFolderWhereTheFilesAre' -File) |
Where-Object { $_.Name -match '[^\x20-\x81]' } |
Rename-Item -NewName { $_.Name -replace '[^\x20-\x81]+', '_' }
Regex Details:
[^\x20-\x81] Match a single character NOT in the range between ASCII character 0x20 (32 decimal) and ASCII character 0x81 (129 decimal)
+ Between one and unlimited times, as many times as possible, giving back as needed (greedy)
Theo's answer is effective, but there's a simpler, more direct solution, using the .NET regex Unicode code block \p{IsBasicLatin}, which directly matches any ASCII-range Unicode character (all .NET strings are Unicode strings, internally composed of UTF-16 code units).
Its negation, \P{IsBasicLatin} (note the uppercase P), matches any character outside the ASCII range, so that you can use the following to replace all non-ASCII-range characters with _, with the help of the regex-based -replace operator:
(Get-ChildItem -File) | # Get all files in the current dir.
Rename-Item -NewName { $_.Name -replace '\P{IsBasicLatin', '_' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
Note:
Enclosing the Get-ChildItem call in (...) ensures that all matching files are collected first, before renaming is performed. This prevents problems that could arise from already-renamed files re-entering the enumeration of files.
Since only files (-File) are to be renamed, you needn't worry about file names that do not contain non-ASCII-range characters: Rename-Item quietly ignores attempts to rename files to the name they already have.
Unfortunately, this is not true for directories, where such an attempt causes an error; this unfortunate discrepancy, present as of PowerShell 7.2.4, is the subject of GitHub issue #14903.
Strictly speaking, .NET characters ([char] (System.Char) instances) are 16-bit Unicode code units (UTF-16), which can individually only represent a complete Unicode character in the so-called BMP (Basic Multilingual Plane), i.e. in the code-point range 0x0-0xFFFF. Unicode characters beyond that range, notably emoji such as 👍, require representation by two .NET [char] instances, so-called surrogate pairs. Therefore, the above solution replaces such characters with two _ characters, as the following example demonstrates:
PS> 'A 👍!' -replace '\P{IsBasicLatin}', '_'
A __! # !! *two* '_' chars.

Replace text in files within a folder PowerShell

I have a folder that contains files like 'goodthing 2007adsdfff.pdf', 'betterthing 2007adfdsw.pdf', and 'bestthing_2007fdsfad.pdf', I want to be able to rename each, eliminating all text including 2007 OR _2007 to the end of the string keeping .pdf and getting this result: 'goodthing.pdf' 'betterthing.pdf' 'bestthing.pdf' I've tried this with the "_2007", but haven't figured out a conditional to also handle the "2007". Any advice on how to accomplish this is greatly appreciated.
Get-ChildItem 'C:Temp\' -Name -Filter *.pdf | foreach { $_.Split("_2017")[0].substring(0)}
Try the following:
Get-ChildItem 'C:\Temp' -Name -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '[_ ][^.]+' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
The above uses Rename-Item with a delay-bind script block and the -replace operator as follows:
Regex [_ ][^.]+ matches everything from the first space or _ char. (character set [ _]) through to the following literal . char. ([^.]+ matches one or more chars. other than (^) than .) - that is, everything from the first / _ through to the filename extension (excluding the .).
Note: To guard against file names such as _2017.pdf matching (which would result in just .pdf as the new name), use the following regex instead: '(?<=.)[_ ][^.]+'
By not providing a replacement operand to -replace, what is matched is replace with the empty string and therefore effectively removed.
The net effect is that input files named
'goodthing 2007adsdfff.pdf', 'betterthing 2007adfdsw.pdf', 'bestthing_2007fdsfad.pdf'
are renamed to
'goodthing.pdf', 'betterthing.pdf', 'bestthing.pdf'
Without knowing the names of all the potential files, I can offer this solution that is 100%:
PS> $flist = ("goodthing 2007adsdfff.pdf","betterthing 2007adfdsw.pdf","bestthing_2007fdsfad.pdf")
PS> foreach ($f in $flist) {$nicename = ($f -replace "([\w\s]+)2007.*(\.\w+)", '$1$2') -replace "[\s_].","." ;$nicename}
goodthing.pdf
betterthing.pdf
bestthing.pdf
Two challenges:
the underscore is actually part of the \w character class. So the alternative to the above is to complicate the regex or try to assume that there will always be only one '_' before the 2007. Both seemed risky to me.
if there are spaces in filenames, there is no telling if you might encounter more than one. This solution removes only the one right before 2007.
The magic:
The -replace operator enables you to quickly capture text in () and re-use it in variables like $1$2. If you have more complex captures, you just have to figure out the order they are assigned.
Hope this helps.

Rename files with Powershell if file has certain structure

I am trying to rename files in multiple folder with same name structure. I got the following files:
(1).txt
(2).txt
(3).txt
I want to add the following text in front of it: "Subject is missing"
I only want to rename these files all other should remain the same
Tip of the hat to LotPings for suggesting the use of a look-ahead assertion in the regex.
Get-ChildItem -File | Rename-Item -NewName {
$_.Name -replace '^(?=\(\d+\)\.)', 'Subject is missing '
} -WhatIf
-WhatIf previews the renaming operation; remove it to perform actual renaming.
Get-ChildItem -File enumerates files only, but without a name filter - while you could try to apply a wildcard-based filter up front - e.g., -Filter '([0-9]).*' - you couldn't ensure that multi-digit names (e.g., (13).txt) are properly matched.
You can, however, pre-filter the results, with -Filter '(*).*'
The Rename-Item call uses a delay-bind script block to derive the new name.
It takes advantage of the fact that (a) -rename returns the input string unmodified if the regex doesn't match, (b) Rename-Item does nothing if the new filename is the same as the old.
In the regex passed to -replace, the positive look-ahead assertion (?=...) (which is matched at the start of the input string (^)) looks for a match for subexpression \(\d+\)\. without considering what it matches a part of what should be replaced. In effect, only the start position (^) of an input string is matched and "replaced".
Subexpression \(\d+\)\. matches a literal ( (escaped as \(), followed by 1 or more (+) digits (\d), followed by a literal ) and a literal . (\.), which marks the start of the filename extension. (Replace .\ with $, the end-of-input assertion if you want to match filenames that have no extension).
Therefore, replacement operand 'Subject is missing ' is effectively prepended to the input string so that, e.g., (1).txt returns Subject is missing (1).txt.

How to replace first characters in a file name with a string?

I've been working on a script to maintain the archive from my IP camera DVR. My recording software outputs filenames formatted so that the first character is the camera number, followed by a date and time stamp.
ex. 1_2017-11-03_00-45-07.avi
I want to replace the first character with a string that represents the camera.
ex. DivertCam_2017-11-03_00-45-07.avi
So far, I have:
Get-ChildItem "D:\DivertCam\1_*.avi" |
Rename-Item -NewName {$_.Name -replace '1_?','DivertCam_'}
Luckily with -WhatIfand running a transcript, I was able to see that my results would be wrong:
What if: Performing the operation "Rename File" on target "Item: D:\DivertCam\1_2017-11-03_00-45-07.avi Destination: D:\DivertCam\DivertCam_20DivertCam_7-DivertCam_DivertCam_-03_00-45-07.avi"
I know it's just picking out every "1_". How can I make it after the the first instance of "1_", or read the filename like a string, split it into 3 arrays separated by "_" and then change the first array?
The -replace operator performs a RegEx match and replacement, so you can use RegEx syntax to do what you want. For you the solution is to include the 'beginning of string' characater ^ at the beginning of your match text. Since this is RegEx, the ? means the previous character may or may not exist, so what you are currently matching on is any character matching '1' which may or may not be followed by an underscore. A better version would simply be:
$_.name -replace '^1','DivertCam'
To put that in context with the rest of your line, it would be:
Get-ChildItem "D:\DivertCam\1_*.avi" | Rename-Item -NewName {$_.name -replace '^1','DivertCam'}
Keep in mind this only works for the -replace operator which uses RegEx (short for Regular Expression) matching, and not the .Replace() method that you may see used, which uses simple pattern matching.
This will replace everything before the first '_' with 'DivertCam' (note use of % (foreach) to operate on each file individually).
Get-ChildItem "D:\DivertCam\1_*.avi" | % {Rename-Item $_.FullName -NewName "DivertCam$($_.Name.Substring($_.Name.IndexOf('_')))" }

Q: Powershell - read and report special characters from file

I've got a huge directory listing of files, and I need to see what special characters exist in the file names - specifically nonstandard characters like you'd get using ALT codes.
I can export a directory listing to a file easily enough with:
get-childitem -path D:\files\ -File -Recurse >output.txt
What I need to do however, is pull out the special characters, and only the special characters from the text file. The only way I can think to easily quantify everything "special" (since there are a ton of possibilities in the that character set) would be to compare the text against a list of characters I'd want to keep, stored in a joined variable (a-z, 0-9, etc)
I can't quite figure out how to pull out the "good" characters, leaving only the special ones. Any ideas on where to start?
I take "special" characters to be anything that falls outside US ASCII.
That basically means any character with a numerical value of 128 or more, easy to inspect in a Where-Object filter:
Get-ChildItem -File -Recurse |Where-Object {
$_.Name.ToCharArray() -gt 127
}
This will return all files containing "special" characters in their name.
If you want to extract the special characters themselves, per file, use ForEach-Object:
Get-ChildItem -File -Recurse |ForEach-Object {
if(($Specials = $_.Name.ToCharArray() -gt 127)){
New-Object psobject -Property #{File=$_.FullName;Specials=$(-join $Specials)}
}
}
Look at piping your results to Select-String. With Select-String you can specify a list of regex values to search for.