Need to batch add characters to filenames using Powershell - powershell

I have a series of files all named something like:
PRT14_WD_14220000_1.jpg
I need to add two zeroes after the last underscore and before the number so it looks like PRT14_WD_14220000_001.jpg
I've tried"
(dir) | rename-Item -new { $_.name -replace '*_*_*_','*_*_*_00' }
Appreciate any help.

The closest thing to what you attempted would be this. In regex, the wildcard is .*. And the parentheses do grouping to refer to later with the dollar sign numbers.
dir *.jpg | rename-Item -new { $_.name -replace '(.*)_(.*)_(.*)_','$1_$2_$3_00' } -whatif
What if: Performing the operation "Rename File" on target "Item: C:\users\admin\foo\PRT14_WD_14220000_1.jpg Destination: C:\users\admin\foo\PRT14_WD_14220000_001.jpg".
Ok, here's my take when you want the number with max two zeroes padding. $num has to be an integer for the .tostring() method I want.
dir *.jpg | rename-item -newname { $a,$b,$c,$num = $_.basename -split '_'
$num = [int]$num
$a + '_' + $b + "_" + $c + '_' + $num.tostring('000') + '.jpg'
} -whatif

the following presumes your last part of the .BaseName will always need two zeros added to it. what it does ...
fakes getting the fileinfo object that you get from Get-Item/Get-ChildItem
replace that with the appropriate cmdlet. [grin]
splits the .BaseName into parts using the _ as the split target
adds two zeros to the final part from the above split
merges the parts into a $NewBaseName
gets the .FullName and replaces the original BaseName with the $newBaseName
displays that new file name
you will still need to do your rename, but that is pretty direct. [grin]
here's the code ...
# fake getting a file info object
# in real life, use Get-Item or Get-ChildItem
$FileInfo = [System.IO.FileInfo]'PRT14_WD_14220000_1.jpg'
$BNParts = $FileInfo.BaseName.Split('_')
$BNParts[-1] = '00{0}' -f $BNParts[-1]
$NewBasename = $BNParts -join '_'
$NewFileName = $FileInfo.FullName.Replace($FileInfo.BaseName, $NewBaseName)
$NewFileName
output = D:\Data\Scripts\PRT14_WD_14220000_001.jpg

The -replace operator operates on regexes (regular expressions), not wildcard expressons such as * (by itself), which is what you're trying to use.
A conceptually more direct approach is to focus the replacement on the end of the string:
Get-ChildItem | # `dir` is a built-in alias for Get-ChildItem`
Rename-Item -NewName { $_.Name -replace '(?<=_)[^_]+(?=\.)', '00$&' } -WhatIf
Note: The -WhatIf common parameter in the command above previews the operation. Remove -WhatIf once you're sure the operation will do what you want.
(?<=_)[^_]+(?=\.) matches a nonempty run (+) of non-_ chars. ([^_]) preceded by _ ((?<=_) and followed by a literal . ((?=\.)), excluding both the preceding _ and the following . from what is captured by the match ((?<=...) and (?=...) are non-capturing look-around assertions).
In short: This matches and captures the characters after the last _ and before the start of the filename extension.
00$& replaces what was matched with 00, followed by what the match captured ($&).
In a follow-up comment you mention wanting to not just blindly insert 00, but to 0-left-pad the number after the last _ to 3 digits, whatever the number may be.
In PowerShell [Core] 6.1+, this can be achieved as follows:
Get-ChildItem |
Rename-Item -NewName {
$_.Name -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
} -WhatIf
The script block ({ ... }) as the replacement operand receives each match as a Match instance stored in automatic variable $_, whose .Value property contains the captured text.
Calling .PadLeft(3, '0') on that captured text 0-left-pads it to 3 digits and outputs the result, which replaces the regex match at hand.
A quick demonstration:
PS> 'PRT14_WD_14220000_1.jpg' -replace '(?<=_)[^_]+(?=\.)', { $_.Value.PadLeft(3, '0') }
PRT14_WD_14220000_001.jpg # Note how '_1.jpg' was replaced with '_001.jpg'
In earlier PowerShell versions, you must make direct use of the .NET [regex] type's .Replace() method, which fundamentally works the same:
Get-ChildItem |
Rename-Item -NewName {
[regex]::Replace($_.Name, '(?<=_)[^_]+(?=\.)', { param($m) $m.Value.PadLeft(3, '0') })
} -WhatIf

Related

Move File only if the file does not contain only a zero

I have the below powershell command which, if the file only contains a zero and no spaces after, works ok.
However, my file has multiple spaces after the 0, followed by a newline.
How can I only move the file if the only character in the file is a 0, no matter on whether there are additional spaces and lines?
$file = Get-Content "Test\A28AP.txt" -raw
if ($file -notmatch '^0$') {
Move-Item "D:\Test\A28AP.txt" -Destination "D:\Test\History"
}
As Jeroen Mostert suggests, using .Trim() on the file's content before comparing to string '0' is the simplest solution in your case:
$filePath = 'Test\A28AP.txt'
if ('0' -ne (Get-Content $filePath -Raw).Trim()) {
Move-Item $filePath -Destination D:\Test\History -WhatIf
}
.Trim() (System.String.Trim) removes all leading and trailing whitespace, and is complemented by .TrimStart() and .TrimEnd() variants. All these methods optionally allow you to control the set of characters to be trimmed.
To look on the first line only, irrespective of whether there are additional, non-empty ones, replace -Raw with -First 1
For the sake of completeness, here's the equivalent (first-line-only) regex solution - even though it is overkill in your case:
$filePath = 'Test\A28AP.txt'
if ((Get-Content $filePath -First 1) -notmatch '^0 *') {
Move-Item $filePath -Destination D:\Test\History -WhatIf
}
' *' matches zero or more spaces after the 0 at the start of the string (^).
With a given multi-line input string (which is what -Raw returns for a multi-line file), limiting matching to the first line requires more effort:
By default, ^ and $ in .NET regexes match only the very beginning and end of the entire input; inline option (?m) can be used to make them match on each line, in which case \A and \Z / \z must be used to match the very beginning and end of the string - see the .NET regex quick reference.
Therefore, use regex '(?m)\A0 *$' to look for a 0 with optional trailing spaces only on the first line of the multi-line string:
$filePath = 'Test\A28AP.txt'
if ((Get-Content $filePath -Raw) -notmatch '(?m)\A0 *$') {
Move-Item $filePath -Destination D:\Test\History -WhatIf
}

Rename file - delete all characters AFTER 2nd underscore

I need to replace the time\date stamp that's included in the filename after 2nd underscore (needs to be in the same format yyyyMMddHHmmss)
example file: 123456_123456_20190716163001.xml
sometimes the file in question gets created with an additional character which invalidates the file, in this case I need to replace this with the current timestamp.
example: 123456_123456_current Timestamp here.xml
The file should never exceed 32 characters(including extension)
I found a script but it deletes everything after the 1st underscore not the 2nd and I'm struggling to find a way to replace the text with the current timestamp.
Get-ChildItem c:\test -Filter 123456_123456*.xml | Foreach-Object -Process {
$NewName = [Regex]::Match($_.Name,"^[^_]*").Value + '.xml' $_ | Rename-Item -NewName $NewName
}
timestamp after 2nd underscore to be updated to the current timestamp if original file exceeds 32 characters
123456_123456_current Timestamp here.xml
this takes advantage of the way a [fileinfo] object is structured. the .BaseName is easy to get to & use .Split() on. then one can use -join to put it back into one basename & finally add the extension onto the basename.
# fake reading in a file info object
# in real life, use Get-ChildItem or Get-Item
$FileObject = [System.IO.FileInfo]'123456_123456_current Timestamp here.xml'
$NewName = -join (($FileObject.BaseName.Split('_')[0,1] -join '_'), $FileObject.Extension)
$NewName
output = 123456_123456.xml
Sticking with the regex theme, you can do the following:
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {$_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"}
If duplicate file names are a concern, you can build in an increment to $CurrentTime.
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
if (test-path $NewName) {
$CurrentTime = [double]$CurrentTime + 1
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
}
$NewName
}
Explanation:
$RegexReplace contains the regex expression that will need to be matched for the ideal rename operation to happen. The regex mechanisms are explained below:
.*?_.*?_: Matches a minimal number of characters (lazy matching) followed by an underscore and then another minimal number of characters followed by an underscore.
.*: Greedily matches any characters
\.: Literally matches the dot character (.).
(): The parentheses here represent capture groups with the first set being 1 and the second set being 2. These are later referenced as ${1} and ${2} in the -replace operation.
Since Rename-Item -NewName supports delayed script binding, we can just pipe Get-ChildItem output directly to it. The current pipeline object is $_.
The -replace operation uses the variable $CurrentTime, which must be expanded in order for a successful outcome. For that reason, we use double quotes around the replacement. Since we do not want capture groups ${1} and ${2} expanded, we backtick escape them.

Powershell Remove Text Between before the file extension and an underscore

I have a few hundered PDF files that have text in their file names which need to be removed. Each of the file names have several underscores in their names depending on how long the file name is. My goal is to remove the text in that exists between the .pdf file extension and the last _.
For example I have:
AB_NAME_NAME_NAME_NAME_DS_123_EN_6.pdf
AC_NAME_NAME_NAME_DS_321_EN_10.pdf
AD_NAME_NAME_DS_321_EN_101.pdf
And would like the bold part to be removed to become:
AB_NAME_NAME_NAME_NAME_DS_123_EN.pdf
AC_NAME_NAME_NAME_DS_321_EN.pdf
AD_NAME_NAME_DS_321_EN.pdf
I am a novice at powershell but I have done some research and have found Powershell - Rename filename by removing the last few characters question helpful but it doesnt get me exactly what I need because I cannot hardcode the length of characters to be removed because they may different lengths (2-4)
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName {$_.name.substring(0,$_.BaseName.length-3) + $_.Extension}
It seems like there may be a way to do this using .split or regex but I was not able to find a solution. Thanks.
You can use the LastIndexOf() method of the [string] class to get the index of the last instance of a character. In your case this should do it:
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName { $_.BaseName.substring(0,$_.BaseName.lastindexof('_')) + $_.Extension }
Using the -replace operator with a regex enables a concise solution:
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_[^_]+(?=\.)' } -WhatIf
-WhatIf previews the renaming operation. Remove it to perform actual renaming.
_[^_]+ matches a _ character followed by one or more non-_ characters ([^-])
If you wanted to match more specifically by (decimal) digits only (\d), use _\d+ instead.
(?=\.) is a look-ahead assertion ((?=...)) that matches a literal . (\.), i.e., the start of the filename extension without including it in the match.
By not providing a replacement operand to -replace, it is implicitly the empty string that replaces what was matched, which effectively removes the last _-prefixed token before the filename extension.
You can make the regex more robust by also handling file names with "double" extensions; e.g., the above solution would replace filename a_bc.d_ef.pdf with a.c.pdf, i.e., perform two replacements. To prevent that, use the following regex instead:
$_.Name -replace '_[^_]+(?=\.[^.]+$)'
The look-ahead assertion now ensures that only the last extension matches: a literal . (\.) followed by one or more (+) characters other than literal . ([^.], a negated character set ([^...])) at the end of the string ($).
Just to show another alternative,
the part to remove from the Name is the last element from the BaseName splitted with _
which is a negative index from the split [-1]
Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{$_.BaseName.split('_\d+')[-1]}
6
10
101
as the split removes the _ it has to be applied again to remove it.
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_'+$_.BaseName.split('_')[-1] } -whatif
EDIT a modified variant which splits the BaseName at the underscore
without removing the splitting character is using the -split operator and
a RegEx with a zero length lookahead
> Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{($_.BaseName -split'(?=_\d+)')[-1]}
_6
_10
_101
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace ($_.BaseName -split'(?=_)')[-1] } -whatif

Retain initial characters in file names, remove all remaining characters using powershell

I have a batch of files with names like: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
I need to retain the first three sets of numbers and remove everything between the third "_" and "_1.pdf".
So this: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
Becomes this: 78887_16667_MR12_1.pdf
Ideally, I'd like to be able to just use the 3rd "_" as the break as the third set of numbers sometimes includes 3 characters, sometimes 4 characters (like the example) and other times, 5 characters.
If I used something like this:
Get-ChildItem Default_*.pdf | Rename-Item -NewName {$_.name -replace...
...and then I'm stuck: can I state that everything from the 3rd "" and the 6th "" should be replaced with "" (nothing)? My understanding that I'd include ".Extension" to also save the extension, too.
You can use the -split operator to split your name into _-separated tokens, extract the tokens of interest, and then join them again with the -join operator:
PS> ('78887_16667_MR12_SMITH_JOHN_713_1.pdf' -split '_')[0..2 + -1] -join '_'
78887_16667_MR12_1.pdf
0..2 extracts the first 3 tokens, and -1 the last one (you could write this array of indices as 0, 1, 2, -1 as well).
Applied in the context of renaming files:
Get-ChildItem -Filter *.pdf | Rename-Item -NewName {
($_.Name -split '_')[0..2 + -1] -join '_'
} -WhatIf
Common parameter -WhatIf previews the rename operation; remove it to perform actual renaming.
mklement0 has given you a good and working answer. Here is another way to do it using a regex.
Get-ChildItem -Filter *.pdf |
ForEach-Object {
if ($_.Name -match '(.*?_.*?_.*?)_.*(_1.*)') {
Rename-Item -Path $_.FullName -NewName $($Matches[1..2] -join '') -WhatIf
}
}

remove extraneous characters from a filename

I have been tasked a little above my head with taking a repository of files and removing excess garbage characters from the filename and saving the renamed file in a different directory folder.
An example of the filenames are:
100-expresstoll.pdf
1000-2012-09-29.jpg
10000-2014-01-15_14.03.22.jpg
10001-2014-01-15_19.05.24.jpg
10002-2014-01-15_21.30.23.jpg
10003-2014-01-16_07.33.54.jpg
10004-2014-01-16_13.33.21.jpg
10005-Feb 4, 2014.jpeg
10006-O'Reilly_Media,_Inc..pdf
First group of numbers at the beginning are record IDs and are to be retained along with the file's extension. Everything else between the record IDs and the file extension needs to be dropped.
For example, the final name for first three files would be:
100.pdf
1000.jpg
10000.jpg
I have read Removing characters and Rearranging filenames in addition to other postings, but the complexity of having a variable character length at the front, a variable number of intermediary characters to be removed and variable file extension types have really tossed this beyond my limited PowerShell reach.
Another approach without regular expression. In both following examples is used risk mitigation parameter -WhatIf for debugging purposes.
Rename files:
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
if ($_.Name -ne $newName) {
Rename-Item -Path $oldFile -NewName $newName -WhatIf
}
}
Rename and move files:
$newDest = 'D:\test' ### change to fit your circumstances
Get-ChildItem -File | ForEach-Object {
$oldFile = $_.FullName
$newName = $_.BaseName.Split('-')[0] + $_.Extension
$newFile = Join-Path -Path $newDest -ChildPath $newName
if ( -not ( Test-Path -Path $newFile ) ) {
Move-Item -Path $oldFile -Destination $newFile -WhatIf
}
}
You can use the -replace operator to do this kind of string manipulation:
Get-ChildItem | foreach {
$old_name = $_.FullName
$new_name = $_.Name -replace '([0-9]+).*(\.[^.]*)$', '$1$2'
Rename-Item $old_name $new_name
}
The regular expression is the trick here:
([0-9]+) means match a series of digits (1 or more digits)
.* means match anything
(\.[^.]*) means match a period followed by any characters other than a period
$ means that the match must reach the end of the string
The first and third are special in that they are surrounded by parentheses which means that you can use those values using the dollar notation (e.g. $1) in the replacement string.
Probably the most idiomatic way of solving this is as follows (assumes that all files of interest - and no others - are in the current dir.):
Get-ChildItem -File | Rename-Item -NewName { ($_.BaseName -split '-')[0] + $_.Extension }
Add common parameter -WhatIf to the Rename-Item command to preview the renaming operation.
Note that Rename-Item always renames items in their current location; to (also) move them, use Move-Item.
If a target with the same name already exists, Rename-Item reports a non-terminating error for each such case (without aborting overall processing).
Note that his could also happen if an input filename contains no -, as that would result in attempt to rename a file to itself.
Explanation:
Get-ChildItem -File outputs [System.IO.FileInfo] objects representing the files in the current directory, which are passed through the pipeline (|) to Rename-Item.
Passing a script block ({ ... }) to Rename-Item's -NewName parameter executes the contained code for each input object, where $_ represents the input object at hand.
Note that this virtually undocumented but frequently used technique is called a script-block parameter [value], where a parameter that is designed to take pipeline input can be bound with a script block that processes the input indirectly.
($_.BaseName -split '-')[0] extracts the 1st --separated token from each input filename's base name (filename without extension).
+, because the LHS is a string, performs string concatenation.
$_.Extension extracts the filename extension from each input filename.
I know this is not a PowerShell thing. If you just want something to work, this is a cmd batch file thing.
SETLOCAL ENABLEDELAYEDEXPANSION
SET "OLDDIR=C:\Users\lit\files"
SET "NEWDIR=C:\Users\lit\newdir"
FOR /F "usebackq tokens=*" %%a IN (`DIR /A:-D /B "%OLDDIR%\*"`) DO (
FOR /F "usebackq delims=- tokens=1" %%b IN (`ECHO %%a`) DO (SET "BN=%%b")
SET "EXT=%%~xa"
ECHO COPY /Y "%OLDDIR%\%%~a" "%NEWDIR%\!BN!!EXT!"
)