remove extraneous characters from a filename - powershell

I have been tasked a little above my head with taking a repository of files and removing excess garbage characters from the filename and saving the renamed file in a different directory folder.
An example of the filenames are:
100-expresstoll.pdf
1000-2012-09-29.jpg
10000-2014-01-15_14.03.22.jpg
10001-2014-01-15_19.05.24.jpg
10002-2014-01-15_21.30.23.jpg
10003-2014-01-16_07.33.54.jpg
10004-2014-01-16_13.33.21.jpg
10005-Feb 4, 2014.jpeg
10006-O'Reilly_Media,_Inc..pdf
First group of numbers at the beginning are record IDs and are to be retained along with the file's extension. Everything else between the record IDs and the file extension needs to be dropped.
For example, the final name for first three files would be:
100.pdf
1000.jpg
10000.jpg
I have read Removing characters and Rearranging filenames in addition to other postings, but the complexity of having a variable character length at the front, a variable number of intermediary characters to be removed and variable file extension types have really tossed this beyond my limited PowerShell reach.

Another approach without regular expression. In both following examples is used risk mitigation parameter -WhatIf for debugging purposes.
Rename files:
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
if ($_.Name -ne $newName) {
Rename-Item -Path $oldFile -NewName $newName -WhatIf
}
}
Rename and move files:
$newDest = 'D:\test' ### change to fit your circumstances
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
$newFile = Join-Path -Path $newDest -ChildPath $newName
if ( -not ( Test-Path -Path $newFile ) ) {
Move-Item -Path $oldFile -Destination $newFile -WhatIf
}
}

You can use the -replace operator to do this kind of string manipulation:
Get-ChildItem | foreach {
$old_name = $_.FullName
$new_name = $_.Name -replace '([0-9]+).*(\.[^.]*)$', '$1$2'
Rename-Item $old_name $new_name
}
The regular expression is the trick here:
([0-9]+) means match a series of digits (1 or more digits)
.* means match anything
(\.[^.]*) means match a period followed by any characters other than a period
$ means that the match must reach the end of the string
The first and third are special in that they are surrounded by parentheses which means that you can use those values using the dollar notation (e.g. $1) in the replacement string.

Probably the most idiomatic way of solving this is as follows (assumes that all files of interest - and no others - are in the current dir.):
Get-ChildItem -File | Rename-Item -NewName { ($_.BaseName -split '-')[0] + $_.Extension }
Add common parameter -WhatIf to the Rename-Item command to preview the renaming operation.
Note that Rename-Item always renames items in their current location; to (also) move them, use Move-Item.
If a target with the same name already exists, Rename-Item reports a non-terminating error for each such case (without aborting overall processing).
Note that his could also happen if an input filename contains no -, as that would result in attempt to rename a file to itself.
Explanation:
Get-ChildItem -File outputs [System.IO.FileInfo] objects representing the files in the current directory, which are passed through the pipeline (|) to Rename-Item.
Passing a script block ({ ... }) to Rename-Item's -NewName parameter executes the contained code for each input object, where $_ represents the input object at hand.
Note that this virtually undocumented but frequently used technique is called a script-block parameter [value], where a parameter that is designed to take pipeline input can be bound with a script block that processes the input indirectly.
($_.BaseName -split '-')[0] extracts the 1st --separated token from each input filename's base name (filename without extension).
+, because the LHS is a string, performs string concatenation.
$_.Extension extracts the filename extension from each input filename.

I know this is not a PowerShell thing. If you just want something to work, this is a cmd batch file thing.
SETLOCAL ENABLEDELAYEDEXPANSION
SET "OLDDIR=C:\Users\lit\files"
SET "NEWDIR=C:\Users\lit\newdir"
FOR /F "usebackq tokens=*" %%a IN (`DIR /A:-D /B "%OLDDIR%\*"`) DO (
FOR /F "usebackq delims=- tokens=1" %%b IN (`ECHO %%a`) DO (SET "BN=%%b")
SET "EXT=%%~xa"
ECHO COPY /Y "%OLDDIR%\%%~a" "%NEWDIR%\!BN!!EXT!"
)

Related

Powershell: Incrementing number while ignoring string tag

this is my first stack question so go easy on me.
Currently working on a project to create a new folder on a network drive by incrementing off of the previous folders version number.
For example:
5.2.0.0110 -> 5.2.0.0111
Here is my current powershell solution that does the trick:
$SourceFolder = "\\corpfs1\setup\ProtectionSuite\Version 5.2.0.x\5.2.0.0001"
$DestinationFolder = "\\corpfs1\setup\ProtectionSuite\Version 5.2.0.x"
$msiSourceFolder = "\\SourceMsiPath"
$exeSourceFolder = "\\SourceExePath"
if (Test-Path $SourceFolder)
{
$latest = Get-ChildItem -Path $DestinationFolder| Sort-Object Name -Descending | Select-Object -First 1
#split the latest filename, increment the number, then re-assemble new filename:
$newFolderName = $latest.BaseName.Split('.')[0] + "." + $latest.BaseName.Split('.')[1] + "."+ $latest.BaseName.Split('.')[2] + "." + ([int]$latest.BaseName.Split('.')[3] + 1).ToString().PadLeft(4,"0")
New-Item -Path $DestinationFolder"\"$newFolderName -ItemType Directory
Copy-Item $msiSourceFolder -Destination $DestinationFolder"\"$newFolderName
Copy-Item $exeSourceFolder -Destination $DestinationFolder"\"$newFolderName
}
However, one thing that this does not account for is version numbers with string at the end. This solution attempts to covert the string -> int which fails. Some of the folders have strings as they are for internal releases so there is no way to just change my naming semantics.
For example: 5.2.0.1234 (eng) -> 5.2.0.1235
I would like to ignore any text after the last four digits and increment as shown in the example above. If anyone has a suggestion I am all ears! Thank you.
You can do:
$version = ($latest.BaseName -replace '^((?:\d+\.){3}\d{4}).*', '$1').Split('.')
$version[-1] = '{0:D4} -f ([int]$version[-1] + 1)
$newFolderName = $version -join '.'
# '5.2.0.0110 (eng)' --> '5.2.0.0111'
As per your comment, you should use Join-Path for constructing the full target path:
$targetPath = Join-Path -Path $DestinationFolder -ChildPath $newFolderName
$null = New-Item -Path $targetPath -ItemType Directory -Force
Copy-Item $msiSourceFolder -Destination $targetPath
Copy-Item $exeSourceFolder -Destination $targetPath
Assuming that your folder names contain only one 4-digit sequence preceded by a ., it is simpler to match and replace only it, using the regular-expression-based -replace operator with a script block-based substitution:
Update:
A later clarification revealed that the post-version suffix in the input string should be (b) removed from the output rather than (a) just ignored for the purpose of incrementing while getting preserved in the output - see the bottom section for a solution to (b).
SOLUTION (a): If the post-version suffix should be preserved:
In PowerShell (Core) v6.1+:
# Replace the sample value '5.2.0.1234 (eng)' with the following in your code:
# $newFolderName = $latest.BaseName [-replace ...]
'5.2.0.1234 (eng)' -replace '(?<=\.)\d{4}', { '{0:0000}' -f (1 + $_.Value) }
The above yields 5.2.0.1235 (eng) - note the incremented last version-number component and the preservation of the suffix.
In Windows PowerShell (versions up to 5.1), where script block-based substitutions aren't supported, direct use of the underlying .NET API is required:
[regex]::Replace('5.2.0.0110 (eng)', '(?<=\.)\d{4}', { '{0:0000}' -f (1 + $args[0].Value) })
Explanation:
(?<=\.)\d{4} is a regex (regular expression) that matches a literal . (\.) inside a look-behind assertion ((?<=...)), followed by 4 ({4}) digits (\d). The look-behind assertion ensures that the literal . isn't included in the text captured by the match.
The script block ({ ... }) receives information about (each) match, as a System.Text.RegularExpressions.Match instance, via the automatic $_ variable in the PowerShell (Core) solution, via the automatic $args variable in the Windows PowerShell solution with the direct .NET call.
The script block's output (return value) is used to replace the matched text:
'{0:0000}' -f ... uses -f, the format operator, to format the RHS with 4-digit 0-padding.
(1 + $_.Value) / (1 + $args[0].Value) adds 1 to the 4-digit sequence captured by the match, which is implicitly converted to a number due to the LHS of the + operation being a number.
SOLUTION (b): If the post-version suffix should be removed:
In PowerShell (Core) v6.1+:
'5.2.0.1234 (eng)' -replace '\.(\d{4}).*', { '.{0:0000}' -f (1 + $_.Groups[1].Value) }
The above yields 5.2.0.1235 - note the incremented last version-number component and the absence of the suffix.
In Windows PowerShell:
[regex]::Replace('5.2.0.1234 (eng)', '\.(\d{4}).*', { '.{0:0000}' -f (1 + $args[0].Groups[1].Value) })

Remove Sections of File Names w/PowerShell

I'm not super knowledgeable when it comes to coding but I'm trying to use PowerShell to find a way to remove the first X number of characters and Last X number of characters from multiple files. Hence, only keeping the middle section.
Ex)
INV~1105619~43458304~~1913216023~0444857 , where 1913216023 is the invoice #. Anything before and after that needs to be removed from the file name.
I used:
get-childitem *.pdf | rename-item -newname { string.substring(22) } to remove the first 22 characters but cannot manage to create a code to remove the remaining half. All files have the same number of characters but various numbers before and after the invoice number (every file name is different).
Any help/advice is greatly appreciated!
There are several methods of doing this.
If you are sure you won't run into naming collisions (so all files have a different invoice number), here's how with three extra alternatives:
(Get-ChildItem -Path 'D:\Test' -Filter '*~~*~*.pdf' -File) |
Rename-Item -NewName {
# my favorite method
'{0}{1}' -f ($_.BaseName -split '~')[-2], $_.Extension
# or
# '{0}{1}' -f ($_.BaseName -replace '^.*~~(\d{10})~.+$', '$1'), $_.Extension
# or this one
# '{0}{1}' -f ([regex]'~~(\d+)~').Match($_.BaseName).Groups[1].Value, $_.Extension
# or if you are absolutely sure of the position and length of the invoice number
# '{0}{1}' -f $_.BaseName.Substring(22,10), $_.Extension
}
The Get-ChildItem line is between brackets to make sure the gathering of the FileInfo objects is complete before carrying on. If you don't do that, chances are you wil try and rename items multiple times
Assuming the target substring always has the same length, there's an overload to substring() that has a length parameter.
'INV~1105619~43458304~~1913216023~0444857'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
$startIndex, $length = 22, 10
'INV~1105619~43458304~~1913216023~0444857'.substring($startIndex, $length)
1913216023
dir ('?'*40) | rename-item -newname { $_.name.substring(22,10) } -whatif
What if: Performing the operation "Rename File" on target
"Item: C:\users\admin\foo\INV~1105619~43458304~~1913216023~0444857
Destination: C:\users\admin\foo\1913216023".

Need to batch add characters to filenames using Powershell

I have a series of files all named something like:
PRT14_WD_14220000_1.jpg
I need to add two zeroes after the last underscore and before the number so it looks like PRT14_WD_14220000_001.jpg
I've tried"
(dir) | rename-Item -new { $_.name -replace '*_*_*_','*_*_*_00' }
Appreciate any help.
The closest thing to what you attempted would be this. In regex, the wildcard is .*. And the parentheses do grouping to refer to later with the dollar sign numbers.
dir *.jpg | rename-Item -new { $_.name -replace '(.*)_(.*)_(.*)_','$1_$2_$3_00' } -whatif
What if: Performing the operation "Rename File" on target "Item: C:\users\admin\foo\PRT14_WD_14220000_1.jpg Destination: C:\users\admin\foo\PRT14_WD_14220000_001.jpg".
Ok, here's my take when you want the number with max two zeroes padding. $num has to be an integer for the .tostring() method I want.
dir *.jpg | rename-item -newname { $a,$b,$c,$num = $_.basename -split '_'
$num = [int]$num
$a + '_' + $b + "_" + $c + '_' + $num.tostring('000') + '.jpg'
} -whatif
the following presumes your last part of the .BaseName will always need two zeros added to it. what it does ...
fakes getting the fileinfo object that you get from Get-Item/Get-ChildItem
replace that with the appropriate cmdlet. [grin]
splits the .BaseName into parts using the _ as the split target
adds two zeros to the final part from the above split
merges the parts into a $NewBaseName
gets the .FullName and replaces the original BaseName with the $newBaseName
displays that new file name
you will still need to do your rename, but that is pretty direct. [grin]
here's the code ...
# fake getting a file info object
# in real life, use Get-Item or Get-ChildItem
$FileInfo = [System.IO.FileInfo]'PRT14_WD_14220000_1.jpg'
$BNParts = $FileInfo.BaseName.Split('_')
$BNParts[-1] = '00{0}' -f $BNParts[-1]
$NewBasename = $BNParts -join '_'
$NewFileName = $FileInfo.FullName.Replace($FileInfo.BaseName, $NewBaseName)
$NewFileName
output = D:\Data\Scripts\PRT14_WD_14220000_001.jpg
The -replace operator operates on regexes (regular expressions), not wildcard expressons such as * (by itself), which is what you're trying to use.
A conceptually more direct approach is to focus the replacement on the end of the string:
Get-ChildItem | # `dir` is a built-in alias for Get-ChildItem`
Rename-Item -NewName { $_.Name -replace '(?<=_)[^_]+(?=\.)', '00$&' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
(?<=_)[^_]+(?=\.) matches a nonempty run (+) of non-_ chars. ([^_]) preceded by _ ((?<=_) and followed by a literal . ((?=\.)), excluding both the preceding _ and the following . from what is captured by the match ((?<=...) and (?=...) are non-capturing look-around assertions).
In short: This matches and captures the characters after the last _ and before the start of the filename extension.
00$& replaces what was matched with 00, followed by what the match captured ($&).
In a follow-up comment you mention wanting to not just blindly insert 00, but to 0-left-pad the number after the last _ to 3 digits, whatever the number may be.
In PowerShell [Core] 6.1+, this can be achieved as follows:
Get-ChildItem |
Rename-Item -NewName {
$_.Name -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
} -WhatIf
The script block ({ ... }) as the replacement operand receives each match as a Match instance stored in automatic variable $_, whose .Value property contains the captured text.
Calling .PadLeft(3, '0') on that captured text 0-left-pads it to 3 digits and outputs the result, which replaces the regex match at hand.
A quick demonstration:
PS> 'PRT14_WD_14220000_1.jpg' -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
PRT14_WD_14220000_001.jpg # Note how '_1.jpg' was replaced with '_001.jpg'
In earlier PowerShell versions, you must make direct use of the .NET [regex] type's .Replace() method, which fundamentally works the same:
Get-ChildItem |
Rename-Item -NewName {
[regex]::Replace($_.Name, '(?<=_)[^_]+(?=\.)', { param($m) $m.Value.PadLeft(3, '0') })
} -WhatIf

Rename file - delete all characters AFTER 2nd underscore

I need to replace the time\date stamp that's included in the filename after 2nd underscore (needs to be in the same format yyyyMMddHHmmss)
example file: 123456_123456_20190716163001.xml
sometimes the file in question gets created with an additional character which invalidates the file, in this case I need to replace this with the current timestamp.
example: 123456_123456_current Timestamp here.xml
The file should never exceed 32 characters(including extension)
I found a script but it deletes everything after the 1st underscore not the 2nd and I'm struggling to find a way to replace the text with the current timestamp.
Get-ChildItem c:\test -Filter 123456_123456*.xml | Foreach-Object -Process {
$NewName = [Regex]::Match($_.Name,"^[^_]*").Value + '.xml' $_ | Rename-Item -NewName $NewName
}
timestamp after 2nd underscore to be updated to the current timestamp if original file exceeds 32 characters
123456_123456_current Timestamp here.xml
this takes advantage of the way a [fileinfo] object is structured. the .BaseName is easy to get to & use .Split() on. then one can use -join to put it back into one basename & finally add the extension onto the basename.
# fake reading in a file info object
# in real life, use Get-ChildItem or Get-Item
$FileObject = [System.IO.FileInfo]'123456_123456_current Timestamp here.xml'
$NewName = -join (($FileObject.BaseName.Split('_')[0,1] -join '_'), $FileObject.Extension)
$NewName
output = 123456_123456.xml
Sticking with the regex theme, you can do the following:
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {$_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"}
If duplicate file names are a concern, you can build in an increment to $CurrentTime.
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
if (test-path $NewName) {
$CurrentTime = [double]$CurrentTime + 1
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
}
$NewName
}
Explanation:
$RegexReplace contains the regex expression that will need to be matched for the ideal rename operation to happen. The regex mechanisms are explained below:
.*?_.*?_: Matches a minimal number of characters (lazy matching) followed by an underscore and then another minimal number of characters followed by an underscore.
.*: Greedily matches any characters
\.: Literally matches the dot character (.).
(): The parentheses here represent capture groups with the first set being 1 and the second set being 2. These are later referenced as ${1} and ${2} in the -replace operation.
Since Rename-Item -NewName supports delayed script binding, we can just pipe Get-ChildItem output directly to it. The current pipeline object is $_.
The -replace operation uses the variable $CurrentTime, which must be expanded in order for a successful outcome. For that reason, we use double quotes around the replacement. Since we do not want capture groups ${1} and ${2} expanded, we backtick escape them.

Powershell add suffix to filenames, based on prefix

I have a directory that consists of a number of text files that have been named:
1Customer.txt
2Customer.txt
...
99Customer.txt
I am trying to create powershell script that will rename the files to a more logical:
Customer1.txt
Customer2.txt
...
Customer99.txt
The prefix can be anything from 1 digit to 3 digits.
As I am new to powershell, I really don't know how I can achieve this. Any help much appreciated.
The most straigth forward way is a gci/ls/dir
with a where matching only BaseNames starting with a number with a
RegEx and piping to
Rename-Item and building the new name from submatches.
ls |? BaseName -match '^(\d+)([^0-9].*)$' |ren -new {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
The same code without aliases
Get-ChildItem |Where-Obect {$_.BaseName -match '^(\d+)([^0-9].*)$'} |
Rename-Item -NewName {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
Here is one way to do it:
Get-ChildItem .\Docs -File |
ForEach-Object {
if($_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
{
Rename-Item -Path $_.FullName -NewName "$($matches.Type)$($matches.Number)$($_.Extension)"
}
}
The line:
$_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
takes the file name (e.g. '23Suppliers.txt') and perform a pattern match on it, pulling out the number part (23) and the 'type' part ('Suppliers'), naming them 'Number' and 'Type' respectively. These are stored by PowerShell in its automatic variable $matches, which is used when working with regular expressions.
We then reconstruct the new file using details from the original file, such as the file's extension ($_.Extension) and the matched type ($matches.Type) and number ($matches.Number):
"$($matches.Type)$($matches.Number)$($_.Extension)"
I'm sure there's a nicer way to do this with regex, but the following is a quick first go at it:
$prefix = "Customer"
Get-ChildItem C:\folder\*$prefix.txt | Rename-Item -NewName {$prefix + ($_.Name -replace $prefix,'')}