Remove Sections of File Names w/PowerShell - powershell

I'm not super knowledgeable when it comes to coding but I'm trying to use PowerShell to find a way to remove the first X number of characters and Last X number of characters from multiple files. Hence, only keeping the middle section.
Ex)
INV~1105619~43458304~~1913216023~0444857 , where 1913216023 is the invoice #. Anything before and after that needs to be removed from the file name.
I used:
get-childitem *.pdf | rename-item -newname { string.substring(22) } to remove the first 22 characters but cannot manage to create a code to remove the remaining half. All files have the same number of characters but various numbers before and after the invoice number (every file name is different).
Any help/advice is greatly appreciated!

There are several methods of doing this.
If you are sure you won't run into naming collisions (so all files have a different invoice number), here's how with three extra alternatives:
(Get-ChildItem -Path 'D:\Test' -Filter '*~~*~*.pdf' -File) |
Rename-Item -NewName {
# my favorite method
'{0}{1}' -f ($_.BaseName -split '~')[-2], $_.Extension
# or
# '{0}{1}' -f ($_.BaseName -replace '^.*~~(\d{10})~.+$', '$1'), $_.Extension
# or this one
# '{0}{1}' -f ([regex]'~~(\d+)~').Match($_.BaseName).Groups[1].Value, $_.Extension
# or if you are absolutely sure of the position and length of the invoice number
# '{0}{1}' -f $_.BaseName.Substring(22,10), $_.Extension
}
The Get-ChildItem line is between brackets to make sure the gathering of the FileInfo objects is complete before carrying on. If you don't do that, chances are you wil try and rename items multiple times

Assuming the target substring always has the same length, there's an overload to substring() that has a length parameter.
'INV~1105619~43458304~~1913216023~0444857'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
$startIndex, $length = 22, 10
'INV~1105619~43458304~~1913216023~0444857'.substring($startIndex, $length)
1913216023
dir ('?'*40) | rename-item -newname { $_.name.substring(22,10) } -whatif
What if: Performing the operation "Rename File" on target
"Item: C:\users\admin\foo\INV~1105619~43458304~~1913216023~0444857
Destination: C:\users\admin\foo\1913216023".

Related

Rename a sequence of files to a new sequence, keeping the suffix

I need help to create a command in PowerShell to rename a sequence of files with this format:
001.jpg
001_1.jpg
002.jpg
002_1.jpg
003.jpg
003_1.jpg
into a new sequence that can start with a number such as 9612449, but keeping intact the suffixes, so new sequence would be:
9612449.jpg
9612449_1.jpg
9612450.jpg
9612450_1.jpg
9612451.jpg
9612451_1.jpg
Assuming that 9612449 is an offset to be added to the existing numbers that make up the first _-separated token or all of the base file names:
# Simulate a set of input files.
$files = [System.IO.FileInfo[]] (
'001.jpg',
'001_1.jpg',
'002.jpg',
'002_1.jpg',
'003.jpg',
'003_1.jpg'
)
# The number to offset the existing numbers by.
$offset = 9612449 - 1
# Process each file and apply the offset.
$files | ForEach-Object {
# Split the base name into the number and the suffix.
$num, $suffix = $_.BaseName -split '(?=_)', 2
# Construct and output the new name, with the offset applied.
'{0}{1}{2}' -f ($offset + $num), $suffix, $_.Extension
}
The above yields the output shown in your question.
Applied to a real file-renaming operation, you'd do something like:
Get-ChildItem -LiteralPath . -Filter *.jpg |
Rename-Item -NewName {
$num, $suffix = $_.BaseName -split '(?=_)', 2
'{0}{1}{2}' -f ($offset + $num), $suffix, $_.Extension
} -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.

Need to batch add characters to filenames using Powershell

I have a series of files all named something like:
PRT14_WD_14220000_1.jpg
I need to add two zeroes after the last underscore and before the number so it looks like PRT14_WD_14220000_001.jpg
I've tried"
(dir) | rename-Item -new { $_.name -replace '*_*_*_','*_*_*_00' }
Appreciate any help.
The closest thing to what you attempted would be this. In regex, the wildcard is .*. And the parentheses do grouping to refer to later with the dollar sign numbers.
dir *.jpg | rename-Item -new { $_.name -replace '(.*)_(.*)_(.*)_','$1_$2_$3_00' } -whatif
What if: Performing the operation "Rename File" on target "Item: C:\users\admin\foo\PRT14_WD_14220000_1.jpg Destination: C:\users\admin\foo\PRT14_WD_14220000_001.jpg".
Ok, here's my take when you want the number with max two zeroes padding. $num has to be an integer for the .tostring() method I want.
dir *.jpg | rename-item -newname { $a,$b,$c,$num = $_.basename -split '_'
$num = [int]$num
$a + '_' + $b + "_" + $c + '_' + $num.tostring('000') + '.jpg'
} -whatif
the following presumes your last part of the .BaseName will always need two zeros added to it. what it does ...
fakes getting the fileinfo object that you get from Get-Item/Get-ChildItem
replace that with the appropriate cmdlet. [grin]
splits the .BaseName into parts using the _ as the split target
adds two zeros to the final part from the above split
merges the parts into a $NewBaseName
gets the .FullName and replaces the original BaseName with the $newBaseName
displays that new file name
you will still need to do your rename, but that is pretty direct. [grin]
here's the code ...
# fake getting a file info object
# in real life, use Get-Item or Get-ChildItem
$FileInfo = [System.IO.FileInfo]'PRT14_WD_14220000_1.jpg'
$BNParts = $FileInfo.BaseName.Split('_')
$BNParts[-1] = '00{0}' -f $BNParts[-1]
$NewBasename = $BNParts -join '_'
$NewFileName = $FileInfo.FullName.Replace($FileInfo.BaseName, $NewBaseName)
$NewFileName
output = D:\Data\Scripts\PRT14_WD_14220000_001.jpg
The -replace operator operates on regexes (regular expressions), not wildcard expressons such as * (by itself), which is what you're trying to use.
A conceptually more direct approach is to focus the replacement on the end of the string:
Get-ChildItem | # `dir` is a built-in alias for Get-ChildItem`
Rename-Item -NewName { $_.Name -replace '(?<=_)[^_]+(?=\.)', '00$&' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
(?<=_)[^_]+(?=\.) matches a nonempty run (+) of non-_ chars. ([^_]) preceded by _ ((?<=_) and followed by a literal . ((?=\.)), excluding both the preceding _ and the following . from what is captured by the match ((?<=...) and (?=...) are non-capturing look-around assertions).
In short: This matches and captures the characters after the last _ and before the start of the filename extension.
00$& replaces what was matched with 00, followed by what the match captured ($&).
In a follow-up comment you mention wanting to not just blindly insert 00, but to 0-left-pad the number after the last _ to 3 digits, whatever the number may be.
In PowerShell [Core] 6.1+, this can be achieved as follows:
Get-ChildItem |
Rename-Item -NewName {
$_.Name -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
} -WhatIf
The script block ({ ... }) as the replacement operand receives each match as a Match instance stored in automatic variable $_, whose .Value property contains the captured text.
Calling .PadLeft(3, '0') on that captured text 0-left-pads it to 3 digits and outputs the result, which replaces the regex match at hand.
A quick demonstration:
PS> 'PRT14_WD_14220000_1.jpg' -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
PRT14_WD_14220000_001.jpg # Note how '_1.jpg' was replaced with '_001.jpg'
In earlier PowerShell versions, you must make direct use of the .NET [regex] type's .Replace() method, which fundamentally works the same:
Get-ChildItem |
Rename-Item -NewName {
[regex]::Replace($_.Name, '(?<=_)[^_]+(?=\.)', { param($m) $m.Value.PadLeft(3, '0') })
} -WhatIf

How to remove last X number of character from file name

Looking for help writing a script that will remove a specific number of characters from the end of a file name. In my specific dilemma, I have dozens of files with the following format:
1234567 XKJDFDA.pdf
5413874 KJDFSXZ.pdf
... etc. etc.
I need to remove the last 7 alpha characters to leave the 7 digits standing as the file name. Through another posted question I was able to find a script that would remove the first X number of digits from the beginning of the file name but I'm having an incredibly difficult time modifying it to remove from the end:
get-childitem *.pdf | rename-item -newname { [string]($_.name).substring(x) }
Any and all relevant help would be greatly appreciated.
Respectfully,
$RootFolder = '\\server.domain.local\share\folder'
Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' |
Where-Object { $_.psIsContainer -eq $false } | # No folders
ForEach-Object {
if ($_.Name -match '^(?<BeginningDigits>\d{7})\s.+\.pdf$' ) {
$local:newName = "$($Matches['BeginningDigits'])$($_.Extension)"
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
}
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
If file name matches "<FilenameBegin><SevenDigits><Space><Something>.pdf<FilenameEnd>", then rename it to "<SevenDigits>.<KeepExtension>". This uses Regular Expressions with Named Selection groups ( <BeginningDigits> is group name ). Take a note that due to RegExp usage, this is most CPU-taking algorythm, but if you have one-time run or you have little amount of files, there is no sense. Otherwise, if you have many files, I'd recommend adding Where-Object { $_.BaseName.Length -gt 7 } | before if (.. -match ..) to filter out files shorter than 7 symbols before RegExp check to minimize CPU Usage ( string length check is less CPU consumable than RegExp ). Also you can remove \.pdf from RegExp to minimize CPU usage, because you already have this filter in Get-ChildItem
If you strictly need match "<7digits><space><7alpha>.pdf", you should replace RegExp expression with '^(?<BeginningDigits>\d{7})\s[A-Z]{7}\.pdf$'
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false } | # No folders
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = [string]::Join('', $_.BaseName.ToCharArray()[0..6] )
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using string split-join: Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
This is idiotic algorythm, because Substring is faster. This just can help learning subarray selection using [x..y]
Please take note that we check string length > 7 before using [x..y] in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols and we trying to take 7th symbol.
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false }
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = $x[0].BaseName.Substring(0,7)
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using substring. Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
.Substring(0,7) # 0 - position of first symbol, 7 - how many symbols to take. Please take note that we check string length > 7 before using substring in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols
A much simpler alternative to PowerShell is using Command Prompt. If your filenames are along the lines of "00001_01.jpg", "00002_01.jpg", "00003_01.jpg", you can use the following command:
ren ?????_0.jpg ?????.jpg
where the number of ? matches the first part of the filename that you want to keep.
You can read more about this and other Command Prompt methods of batch renaming files in Windows at this useful website.
EDIT: edited to preserve the extension
There's another substring() method that takes 2 args, startIndex and length. https://learn.microsoft.com/en-us/dotnet/api/system.string.substring?view=netframework-4.8
'hithere'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
Thus, to delete a total of 8 characters from the right of the basename, including the space:
get-childitem *.pdf | rename-item -newname { $_.basename.substring(0,
$_.basename.length-8) + $_.extension } -whatif
What if: Performing the operation "Rename File" on target
"Item: /Users/js/1234567890.pdf
Destination: /Users/js/12.pdf".
You can write a quick bash function for this.
function fileNameShortener() {
mv "$1" ${1:0:4}
}
This will take the first Argument. which is stored in $1, and create a substring from index 0 to the index 4 characters to the left. This is then used in the move command to move the initial file to the new filename. To Further generalise:
function fileNameShortener2() {
mv "$1" ${1:$2:$3}
}
This can be used, to give starting point, and length of the Substring as second and third function argument.
fileNameShortener fileName.txt 0 -5
This sample would remove the last 5 characters from the Filename.

Rename file - delete all characters AFTER 2nd underscore

I need to replace the time\date stamp that's included in the filename after 2nd underscore (needs to be in the same format yyyyMMddHHmmss)
example file: 123456_123456_20190716163001.xml
sometimes the file in question gets created with an additional character which invalidates the file, in this case I need to replace this with the current timestamp.
example: 123456_123456_current Timestamp here.xml
The file should never exceed 32 characters(including extension)
I found a script but it deletes everything after the 1st underscore not the 2nd and I'm struggling to find a way to replace the text with the current timestamp.
Get-ChildItem c:\test -Filter 123456_123456*.xml | Foreach-Object -Process {
$NewName = [Regex]::Match($_.Name,"^[^_]*").Value + '.xml' $_ | Rename-Item -NewName $NewName
}
timestamp after 2nd underscore to be updated to the current timestamp if original file exceeds 32 characters
123456_123456_current Timestamp here.xml
this takes advantage of the way a [fileinfo] object is structured. the .BaseName is easy to get to & use .Split() on. then one can use -join to put it back into one basename & finally add the extension onto the basename.
# fake reading in a file info object
# in real life, use Get-ChildItem or Get-Item
$FileObject = [System.IO.FileInfo]'123456_123456_current Timestamp here.xml'
$NewName = -join (($FileObject.BaseName.Split('_')[0,1] -join '_'), $FileObject.Extension)
$NewName
output = 123456_123456.xml
Sticking with the regex theme, you can do the following:
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {$_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"}
If duplicate file names are a concern, you can build in an increment to $CurrentTime.
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
if (test-path $NewName) {
$CurrentTime = [double]$CurrentTime + 1
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
}
$NewName
}
Explanation:
$RegexReplace contains the regex expression that will need to be matched for the ideal rename operation to happen. The regex mechanisms are explained below:
.*?_.*?_: Matches a minimal number of characters (lazy matching) followed by an underscore and then another minimal number of characters followed by an underscore.
.*: Greedily matches any characters
\.: Literally matches the dot character (.).
(): The parentheses here represent capture groups with the first set being 1 and the second set being 2. These are later referenced as ${1} and ${2} in the -replace operation.
Since Rename-Item -NewName supports delayed script binding, we can just pipe Get-ChildItem output directly to it. The current pipeline object is $_.
The -replace operation uses the variable $CurrentTime, which must be expanded in order for a successful outcome. For that reason, we use double quotes around the replacement. Since we do not want capture groups ${1} and ${2} expanded, we backtick escape them.

remove extraneous characters from a filename

I have been tasked a little above my head with taking a repository of files and removing excess garbage characters from the filename and saving the renamed file in a different directory folder.
An example of the filenames are:
100-expresstoll.pdf
1000-2012-09-29.jpg
10000-2014-01-15_14.03.22.jpg
10001-2014-01-15_19.05.24.jpg
10002-2014-01-15_21.30.23.jpg
10003-2014-01-16_07.33.54.jpg
10004-2014-01-16_13.33.21.jpg
10005-Feb 4, 2014.jpeg
10006-O'Reilly_Media,_Inc..pdf
First group of numbers at the beginning are record IDs and are to be retained along with the file's extension. Everything else between the record IDs and the file extension needs to be dropped.
For example, the final name for first three files would be:
100.pdf
1000.jpg
10000.jpg
I have read Removing characters and Rearranging filenames in addition to other postings, but the complexity of having a variable character length at the front, a variable number of intermediary characters to be removed and variable file extension types have really tossed this beyond my limited PowerShell reach.
Another approach without regular expression. In both following examples is used risk mitigation parameter -WhatIf for debugging purposes.
Rename files:
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
if ($_.Name -ne $newName) {
Rename-Item -Path $oldFile -NewName $newName -WhatIf
}
}
Rename and move files:
$newDest = 'D:\test' ### change to fit your circumstances
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
$newFile = Join-Path -Path $newDest -ChildPath $newName
if ( -not ( Test-Path -Path $newFile ) ) {
Move-Item -Path $oldFile -Destination $newFile -WhatIf
}
}
You can use the -replace operator to do this kind of string manipulation:
Get-ChildItem | foreach {
$old_name = $_.FullName
$new_name = $_.Name -replace '([0-9]+).*(\.[^.]*)$', '$1$2'
Rename-Item $old_name $new_name
}
The regular expression is the trick here:
([0-9]+) means match a series of digits (1 or more digits)
.* means match anything
(\.[^.]*) means match a period followed by any characters other than a period
$ means that the match must reach the end of the string
The first and third are special in that they are surrounded by parentheses which means that you can use those values using the dollar notation (e.g. $1) in the replacement string.
Probably the most idiomatic way of solving this is as follows (assumes that all files of interest - and no others - are in the current dir.):
Get-ChildItem -File | Rename-Item -NewName { ($_.BaseName -split '-')[0] + $_.Extension }
Add common parameter -WhatIf to the Rename-Item command to preview the renaming operation.
Note that Rename-Item always renames items in their current location; to (also) move them, use Move-Item.
If a target with the same name already exists, Rename-Item reports a non-terminating error for each such case (without aborting overall processing).
Note that his could also happen if an input filename contains no -, as that would result in attempt to rename a file to itself.
Explanation:
Get-ChildItem -File outputs [System.IO.FileInfo] objects representing the files in the current directory, which are passed through the pipeline (|) to Rename-Item.
Passing a script block ({ ... }) to Rename-Item's -NewName parameter executes the contained code for each input object, where $_ represents the input object at hand.
Note that this virtually undocumented but frequently used technique is called a script-block parameter [value], where a parameter that is designed to take pipeline input can be bound with a script block that processes the input indirectly.
($_.BaseName -split '-')[0] extracts the 1st --separated token from each input filename's base name (filename without extension).
+, because the LHS is a string, performs string concatenation.
$_.Extension extracts the filename extension from each input filename.
I know this is not a PowerShell thing. If you just want something to work, this is a cmd batch file thing.
SETLOCAL ENABLEDELAYEDEXPANSION
SET "OLDDIR=C:\Users\lit\files"
SET "NEWDIR=C:\Users\lit\newdir"
FOR /F "usebackq tokens=*" %%a IN (`DIR /A:-D /B "%OLDDIR%\*"`) DO (
FOR /F "usebackq delims=- tokens=1" %%b IN (`ECHO %%a`) DO (SET "BN=%%b")
SET "EXT=%%~xa"
ECHO COPY /Y "%OLDDIR%\%%~a" "%NEWDIR%\!BN!!EXT!"
)