how do I remove every character after a _ in folder names
ex
folder names
307456_ajksndkajsdna_asd_busd to 307456
780451_dsadafg_4565 to 780451
edit: also remove the _
I am using windows 7
In PowerShell, you can use the -split operator to split the string by _ and keep only the first part:
$number,$null = '307456_ajksndkajsdna_asd_busd' -split '_'
The value of $number is now 307456, the rest has been discarded.
If you want to rename all child folders in a folder to just the first number, use Get-ChildItem to retrieve the files, and Rename-Item to rename them:
Get-ChildItem -Path C:\folder\name |Where-Object {$_.PSIsContainer} |Rename-Item -NewName { $($_.Name -split '_')[0] }
Related
I need to replace the time\date stamp that's included in the filename after 2nd underscore (needs to be in the same format yyyyMMddHHmmss)
example file: 123456_123456_20190716163001.xml
sometimes the file in question gets created with an additional character which invalidates the file, in this case I need to replace this with the current timestamp.
example: 123456_123456_current Timestamp here.xml
The file should never exceed 32 characters(including extension)
I found a script but it deletes everything after the 1st underscore not the 2nd and I'm struggling to find a way to replace the text with the current timestamp.
Get-ChildItem c:\test -Filter 123456_123456*.xml | Foreach-Object -Process {
$NewName = [Regex]::Match($_.Name,"^[^_]*").Value + '.xml' $_ | Rename-Item -NewName $NewName
}
timestamp after 2nd underscore to be updated to the current timestamp if original file exceeds 32 characters
123456_123456_current Timestamp here.xml
this takes advantage of the way a [fileinfo] object is structured. the .BaseName is easy to get to & use .Split() on. then one can use -join to put it back into one basename & finally add the extension onto the basename.
# fake reading in a file info object
# in real life, use Get-ChildItem or Get-Item
$FileObject = [System.IO.FileInfo]'123456_123456_current Timestamp here.xml'
$NewName = -join (($FileObject.BaseName.Split('_')[0,1] -join '_'), $FileObject.Extension)
$NewName
output = 123456_123456.xml
Sticking with the regex theme, you can do the following:
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {$_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"}
If duplicate file names are a concern, you can build in an increment to $CurrentTime.
$CurrentTime = Get-Date -Format 'yyyyMMddHHmmss'
$RegexReplace = "(.*?_.*?_).*(\..*)"
Get-ChildItem c:\test -Filter 123456_123456*.xml |
Rename-Item -NewName {
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
if (test-path $NewName) {
$CurrentTime = [double]$CurrentTime + 1
$NewName = $_.Name -replace $RegexReplace,"`${1}$CurrentTime`${2}"
}
$NewName
}
Explanation:
$RegexReplace contains the regex expression that will need to be matched for the ideal rename operation to happen. The regex mechanisms are explained below:
.*?_.*?_: Matches a minimal number of characters (lazy matching) followed by an underscore and then another minimal number of characters followed by an underscore.
.*: Greedily matches any characters
\.: Literally matches the dot character (.).
(): The parentheses here represent capture groups with the first set being 1 and the second set being 2. These are later referenced as ${1} and ${2} in the -replace operation.
Since Rename-Item -NewName supports delayed script binding, we can just pipe Get-ChildItem output directly to it. The current pipeline object is $_.
The -replace operation uses the variable $CurrentTime, which must be expanded in order for a successful outcome. For that reason, we use double quotes around the replacement. Since we do not want capture groups ${1} and ${2} expanded, we backtick escape them.
I have a few hundered PDF files that have text in their file names which need to be removed. Each of the file names have several underscores in their names depending on how long the file name is. My goal is to remove the text in that exists between the .pdf file extension and the last _.
For example I have:
AB_NAME_NAME_NAME_NAME_DS_123_EN_6.pdf
AC_NAME_NAME_NAME_DS_321_EN_10.pdf
AD_NAME_NAME_DS_321_EN_101.pdf
And would like the bold part to be removed to become:
AB_NAME_NAME_NAME_NAME_DS_123_EN.pdf
AC_NAME_NAME_NAME_DS_321_EN.pdf
AD_NAME_NAME_DS_321_EN.pdf
I am a novice at powershell but I have done some research and have found Powershell - Rename filename by removing the last few characters question helpful but it doesnt get me exactly what I need because I cannot hardcode the length of characters to be removed because they may different lengths (2-4)
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName {$_.name.substring(0,$_.BaseName.length-3) + $_.Extension}
It seems like there may be a way to do this using .split or regex but I was not able to find a solution. Thanks.
You can use the LastIndexOf() method of the [string] class to get the index of the last instance of a character. In your case this should do it:
Get-ChildItem 'C:\Path\here' -filter *.pdf | rename-item -NewName { $_.BaseName.substring(0,$_.BaseName.lastindexof('_')) + $_.Extension }
Using the -replace operator with a regex enables a concise solution:
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_[^_]+(?=\.)' } -WhatIf
-WhatIf previews the renaming operation. Remove it to perform actual renaming.
_[^_]+ matches a _ character followed by one or more non-_ characters ([^-])
If you wanted to match more specifically by (decimal) digits only (\d), use _\d+ instead.
(?=\.) is a look-ahead assertion ((?=...)) that matches a literal . (\.), i.e., the start of the filename extension without including it in the match.
By not providing a replacement operand to -replace, it is implicitly the empty string that replaces what was matched, which effectively removes the last _-prefixed token before the filename extension.
You can make the regex more robust by also handling file names with "double" extensions; e.g., the above solution would replace filename a_bc.d_ef.pdf with a.c.pdf, i.e., perform two replacements. To prevent that, use the following regex instead:
$_.Name -replace '_[^_]+(?=\.[^.]+$)'
The look-ahead assertion now ensures that only the last extension matches: a literal . (\.) followed by one or more (+) characters other than literal . ([^.], a negated character set ([^...])) at the end of the string ($).
Just to show another alternative,
the part to remove from the Name is the last element from the BaseName splitted with _
which is a negative index from the split [-1]
Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{$_.BaseName.split('_\d+')[-1]}
6
10
101
as the split removes the _ it has to be applied again to remove it.
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace '_'+$_.BaseName.split('_')[-1] } -whatif
EDIT a modified variant which splits the BaseName at the underscore
without removing the splitting character is using the -split operator and
a RegEx with a zero length lookahead
> Get-ChildItem 'C:\Path\here' -Filter *.pdf |%{($_.BaseName -split'(?=_\d+)')[-1]}
_6
_10
_101
Get-ChildItem 'C:\Path\here' -Filter *.pdf |
Rename-Item -NewName { $_.Name -replace ($_.BaseName -split'(?=_)')[-1] } -whatif
I have a batch of files with names like: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
I need to retain the first three sets of numbers and remove everything between the third "_" and "_1.pdf".
So this: 78887_16667_MR12_SMITH_JOHN_713_1.pdf
Becomes this: 78887_16667_MR12_1.pdf
Ideally, I'd like to be able to just use the 3rd "_" as the break as the third set of numbers sometimes includes 3 characters, sometimes 4 characters (like the example) and other times, 5 characters.
If I used something like this:
Get-ChildItem Default_*.pdf | Rename-Item -NewName {$_.name -replace...
...and then I'm stuck: can I state that everything from the 3rd "" and the 6th "" should be replaced with "" (nothing)? My understanding that I'd include ".Extension" to also save the extension, too.
You can use the -split operator to split your name into _-separated tokens, extract the tokens of interest, and then join them again with the -join operator:
PS> ('78887_16667_MR12_SMITH_JOHN_713_1.pdf' -split '_')[0..2 + -1] -join '_'
78887_16667_MR12_1.pdf
0..2 extracts the first 3 tokens, and -1 the last one (you could write this array of indices as 0, 1, 2, -1 as well).
Applied in the context of renaming files:
Get-ChildItem -Filter *.pdf | Rename-Item -NewName {
($_.Name -split '_')[0..2 + -1] -join '_'
} -WhatIf
Common parameter -WhatIf previews the rename operation; remove it to perform actual renaming.
mklement0 has given you a good and working answer. Here is another way to do it using a regex.
Get-ChildItem -Filter *.pdf |
ForEach-Object {
if ($_.Name -match '(.*?_.*?_.*?)_.*(_1.*)') {
Rename-Item -Path $_.FullName -NewName $($Matches[1..2] -join '') -WhatIf
}
}
I have over a million files like such: First_Last_MI_DOB_ and lots more information. Is there a way I can run a rename script that can remove just the first, last, Mi, and DOB from the file name, but keep the stuff after that? Thank you.
Edited from my answer to this question: Parse and Switch Elements of Folder Names using Powershell
# Path to folder
$Path = '.\'
# Regex to match "ID_000000..."
$Regex = 'ID_\d+.*$'
# Get all objects in path
Get-ChildItem -Path $Path |
# Select only objects that are not directory and name matches regex
Where-Object {!$_.PSIsContainer -and $_.Name -match $Regex} |
# For each such object
ForEach-Object {
# Rename object
Rename-Item -Path $_.FullName -NewName $Matches[0]
}
UPDATE #1 : It seems that you need to write a regex that will match a required part of the name and then use it in to rename a document.
Assuming that file name is x-John_Doe_._DOB_01-11-1990_M_ID_000000_TitleofDocument_DateofDocument_Docpagenumber_, here is couple of the examples:
Regex (https://regex101.com/r/gI0fZ2/2): (ID_\d+.*)$ - will match ID_{ONE_OR_MORE_DIGITS}{ANY_CHARACTERS}
Result:ID_000000_TitleofDocument_DateofDocument_Docpagenumber_
Regex (https://regex101.com/r/gI0fZ2/1): \d{4}_(M|F)_(.*)$ - will match {4_DIGITS}_M_{or}_F_ and capture everything after that in capture group.
Result:
1st match - M
2nd match (the one to use) - ID_000000_TitleofDocument_DateofDocument_Docpagenumber_
UPDATE #2:
All the names in each file are different, a long with different ID's.
For example: John_Doe_DOB_01/01/01_ID_000000 and the next file name
could be: John_Smith_DOB_01/02/01_ID_100000 and so on. I am thinking I
would just want to read the file name in as a string, split it by _
and then make the new file name the stuff from [4] and after. Is there
a way to do that?
Sure, you can do that, but I'd recommend a regex approach, because it would work for every filename that has ID_0xxxx string, no matter of what. I've modified my initial example with first regex, so it should work for you.
But if you'd like to try splitting approach, here is how to do it:
# Path to folder
$Path = '.\'
# Filename separator
$Separator = '_'
# Get all objects in path
Get-ChildItem -Path $Path |
# Select only objects that are not directory and name matches regex
Where-Object {!$_.PSIsContainer} |
# For each such object
ForEach-Object {
# Generate new name
$NewName = ($_.Name -split $Separator | Select-Object -Skip 4) -join $Separator
# Rename object
Rename-Item -Path $_.FullName -NewName $NewName
}
My media files have the following names:
s01ep01
S01ep02
etc.
I need to remove the letter "p" so my program can properly cross reference the episodes.
Something like this should do:
Get-ChildItem 'C:\your\folder' |
Rename-Item -NewName { $_.Name -replace '^(s\d+e)p', '$1' }