How to remove last X number of character from file name - powershell

Looking for help writing a script that will remove a specific number of characters from the end of a file name. In my specific dilemma, I have dozens of files with the following format:
1234567 XKJDFDA.pdf
5413874 KJDFSXZ.pdf
... etc. etc.
I need to remove the last 7 alpha characters to leave the 7 digits standing as the file name. Through another posted question I was able to find a script that would remove the first X number of digits from the beginning of the file name but I'm having an incredibly difficult time modifying it to remove from the end:
get-childitem *.pdf | rename-item -newname { [string]($_.name).substring(x) }
Any and all relevant help would be greatly appreciated.
Respectfully,

$RootFolder = '\\server.domain.local\share\folder'
Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' |
Where-Object { $_.psIsContainer -eq $false } | # No folders
ForEach-Object {
if ($_.Name -match '^(?<BeginningDigits>\d{7})\s.+\.pdf$' ) {
$local:newName = "$($Matches['BeginningDigits'])$($_.Extension)"
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
}
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
If file name matches "<FilenameBegin><SevenDigits><Space><Something>.pdf<FilenameEnd>", then rename it to "<SevenDigits>.<KeepExtension>". This uses Regular Expressions with Named Selection groups ( <BeginningDigits> is group name ). Take a note that due to RegExp usage, this is most CPU-taking algorythm, but if you have one-time run or you have little amount of files, there is no sense. Otherwise, if you have many files, I'd recommend adding Where-Object { $_.BaseName.Length -gt 7 } | before if (.. -match ..) to filter out files shorter than 7 symbols before RegExp check to minimize CPU Usage ( string length check is less CPU consumable than RegExp ). Also you can remove \.pdf from RegExp to minimize CPU usage, because you already have this filter in Get-ChildItem
If you strictly need match "<7digits><space><7alpha>.pdf", you should replace RegExp expression with '^(?<BeginningDigits>\d{7})\s[A-Z]{7}\.pdf$'
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false } | # No folders
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = [string]::Join('', $_.BaseName.ToCharArray()[0..6] )
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using string split-join: Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
This is idiotic algorythm, because Substring is faster. This just can help learning subarray selection using [x..y]
Please take note that we check string length > 7 before using [x..y] in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols and we trying to take 7th symbol.
$RootFolder = '\\server.domain.local\share\folder'
#( Get-ChildItem -LiteralPath $RootFolder -Filter '*.pdf' ) |
Where-Object { $_.psIsContainer -eq $false }
Where-Object { $_.BaseName.Length -gt 7 } | # For files where basename (name without extension) have more than 7 symbols)
ForEach-Object {
$local:newName = $x[0].BaseName.Substring(0,7)
return Rename-Item -LiteralPath $_.FullName -NewName $local:newName -PassThru
} |
ForEach-Object {Write-Host "New name: $($_.Name)"}
Alternative: Using substring. Rename all files, whose name without extension > 7 symbols to first 7 symbols ( not taking in account if digits or not ), keeping extension.
.Substring(0,7) # 0 - position of first symbol, 7 - how many symbols to take. Please take note that we check string length > 7 before using substring in Where-Object { $_.BaseName.Length -gt 7 }. Otherwise we cat hit error when name is shorter than 7 symbols

A much simpler alternative to PowerShell is using Command Prompt. If your filenames are along the lines of "00001_01.jpg", "00002_01.jpg", "00003_01.jpg", you can use the following command:
ren ?????_0.jpg ?????.jpg
where the number of ? matches the first part of the filename that you want to keep.
You can read more about this and other Command Prompt methods of batch renaming files in Windows at this useful website.

EDIT: edited to preserve the extension
There's another substring() method that takes 2 args, startIndex and length. https://learn.microsoft.com/en-us/dotnet/api/system.string.substring?view=netframework-4.8
'hithere'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
Thus, to delete a total of 8 characters from the right of the basename, including the space:
get-childitem *.pdf | rename-item -newname { $_.basename.substring(0,
$_.basename.length-8) + $_.extension } -whatif
What if: Performing the operation "Rename File" on target
"Item: /Users/js/1234567890.pdf
Destination: /Users/js/12.pdf".

You can write a quick bash function for this.
function fileNameShortener() {
mv "$1" ${1:0:4}
}
This will take the first Argument. which is stored in $1, and create a substring from index 0 to the index 4 characters to the left. This is then used in the move command to move the initial file to the new filename. To Further generalise:
function fileNameShortener2() {
mv "$1" ${1:$2:$3}
}
This can be used, to give starting point, and length of the Substring as second and third function argument.
fileNameShortener fileName.txt 0 -5
This sample would remove the last 5 characters from the Filename.

Related

Remove Sections of File Names w/PowerShell

I'm not super knowledgeable when it comes to coding but I'm trying to use PowerShell to find a way to remove the first X number of characters and Last X number of characters from multiple files. Hence, only keeping the middle section.
Ex)
INV~1105619~43458304~~1913216023~0444857 , where 1913216023 is the invoice #. Anything before and after that needs to be removed from the file name.
I used:
get-childitem *.pdf | rename-item -newname { string.substring(22) } to remove the first 22 characters but cannot manage to create a code to remove the remaining half. All files have the same number of characters but various numbers before and after the invoice number (every file name is different).
Any help/advice is greatly appreciated!
There are several methods of doing this.
If you are sure you won't run into naming collisions (so all files have a different invoice number), here's how with three extra alternatives:
(Get-ChildItem -Path 'D:\Test' -Filter '*~~*~*.pdf' -File) |
Rename-Item -NewName {
# my favorite method
'{0}{1}' -f ($_.BaseName -split '~')[-2], $_.Extension
# or
# '{0}{1}' -f ($_.BaseName -replace '^.*~~(\d{10})~.+$', '$1'), $_.Extension
# or this one
# '{0}{1}' -f ([regex]'~~(\d+)~').Match($_.BaseName).Groups[1].Value, $_.Extension
# or if you are absolutely sure of the position and length of the invoice number
# '{0}{1}' -f $_.BaseName.Substring(22,10), $_.Extension
}
The Get-ChildItem line is between brackets to make sure the gathering of the FileInfo objects is complete before carrying on. If you don't do that, chances are you wil try and rename items multiple times
Assuming the target substring always has the same length, there's an overload to substring() that has a length parameter.
'INV~1105619~43458304~~1913216023~0444857'.substring
OverloadDefinitions
-------------------
string Substring(int startIndex)
string Substring(int startIndex, int length)
$startIndex, $length = 22, 10
'INV~1105619~43458304~~1913216023~0444857'.substring($startIndex, $length)
1913216023
dir ('?'*40) | rename-item -newname { $_.name.substring(22,10) } -whatif
What if: Performing the operation "Rename File" on target
"Item: C:\users\admin\foo\INV~1105619~43458304~~1913216023~0444857
Destination: C:\users\admin\foo\1913216023".

Powerhsell/Unix Bash - Find missing numbers for a set of files

I'm currently trying to detect some missing files for batches.
This batch generates files with following convention:
File_1_01.txt (first hour, first iteration)
File_1_02.txt
I want to create a script to find out which iteration is missing.
I found this piece of code in powershell
function missingNumbers {
Get-ChildItem -path ./* -include *.txt
| Where{$_.Name -match '\d+'}
| ForEach{$Matches[0]}
| sort {[int]$_} |% {$i = 1}
{while ($i -lt $_){$i;$i++};$i++}}
However when i use this script against
File_1_01.txt
File_1_02.txt
File_1_04.txt
It doesn't return entry 03
Investigations
Remove hourly occurrence File_<1>_helps and script behaves as expected.
If i concatenate hour and occurrence, script will display all numbers before 101.
I'm open to having sth in Unix as well.
I had another approach in mind, removing the common text between all files but have no idea how to do it.
Instead of matching the 1 every time, you can match the 2 digits \d\d at the end. Putting the pipe symbol on the next line like that will only work in powershell 7.
Get-ChildItem -path ./* -include *.txt |
Where Name -match \d\d |
ForEach{ $Matches[0] } |
sort {[int]$_} |
% { $i = 1 } { while ($i -lt $_){ $i;$i++ }; $i++ }
3
You could put all the numbers together like this, but then they'd have to all be right after each other. Adjust the match pattern or get-childitem for your needs.
Get-ChildItem -path ./* -include *.txt |
Where Name -match '(\d).(\d\d)' |
ForEach{ $Matches[1] + $Matches[2] }
101
102
104

Powershell add suffix to filenames, based on prefix

I have a directory that consists of a number of text files that have been named:
1Customer.txt
2Customer.txt
...
99Customer.txt
I am trying to create powershell script that will rename the files to a more logical:
Customer1.txt
Customer2.txt
...
Customer99.txt
The prefix can be anything from 1 digit to 3 digits.
As I am new to powershell, I really don't know how I can achieve this. Any help much appreciated.
The most straigth forward way is a gci/ls/dir
with a where matching only BaseNames starting with a number with a
RegEx and piping to
Rename-Item and building the new name from submatches.
ls |? BaseName -match '^(\d+)([^0-9].*)$' |ren -new {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
The same code without aliases
Get-ChildItem |Where-Obect {$_.BaseName -match '^(\d+)([^0-9].*)$'} |
Rename-Item -NewName {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
Here is one way to do it:
Get-ChildItem .\Docs -File |
ForEach-Object {
if($_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
{
Rename-Item -Path $_.FullName -NewName "$($matches.Type)$($matches.Number)$($_.Extension)"
}
}
The line:
$_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
takes the file name (e.g. '23Suppliers.txt') and perform a pattern match on it, pulling out the number part (23) and the 'type' part ('Suppliers'), naming them 'Number' and 'Type' respectively. These are stored by PowerShell in its automatic variable $matches, which is used when working with regular expressions.
We then reconstruct the new file using details from the original file, such as the file's extension ($_.Extension) and the matched type ($matches.Type) and number ($matches.Number):
"$($matches.Type)$($matches.Number)$($_.Extension)"
I'm sure there's a nicer way to do this with regex, but the following is a quick first go at it:
$prefix = "Customer"
Get-ChildItem C:\folder\*$prefix.txt | Rename-Item -NewName {$prefix + ($_.Name -replace $prefix,'')}

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory

Renaming a new folder file to the next incremental number with powershell script

I would really appreciate your help with this
I should first mention that I have been unable to find any specific solutions and I am very new to programming with powershell, hence my request
I wish to write (and later schedule) a script in powershell that looks for a file with a specific name - RFUNNEL and then renames this to R0000001. There will only be one of such 'RFUNELL' files in the folder at any time. However when next the script is run and finds a new RFUNNEL file I will this to be renamed to R0000002 and so on and so forth
I have struggled with this for some weeks now and the seemingly similar solutions that I have come across have not been of much help - perhaps because of my admittedly limited experience with powershell.
Others might be able to do this with less syntax, but try this:
$rootpath = "C:\derp"
if (Test-Path "$rootpath\RFUNNEL.txt")
{ $maxfile = Get-ChildItem $rootpath | ?{$_.BaseName -like "R[0-9][0-9][0-9][0-9][0-9][0-9][0-9]"} | Sort BaseName -Descending | Select -First 1 -Expand BaseName;
if (!$maxfile) { $maxfile = "R0000000" }
[int32]$filenumberint = $maxfile.substring(1); $filenumberint++
[string]$filenumberstring = ($filenumberint).ToString("0000000");
[string]$newName = ("R" + $filenumberstring + ".txt");
Rename-Item "$rootpath\RFUNNEL.txt" $newName;
}
Here's an alternative using regex:
[cmdletbinding()]
param()
$triggerFile = "RFUNNEL.txt"
$searchPattern = "R*.txt"
$nextAvailable = 0
# If the trigger file exists
if (Test-Path -Path $triggerFile)
{
# Get a list of files matching search pattern
$files = Get-ChildItem "$searchPattern" -exclude "$triggerFile"
if ($files)
{
# store the filenames in a simple array
$files = $files | select -expandProperty Name
$files | Write-Verbose
# Get next available file by carrying out a
# regex replace to extract the numeric part of the file and get the maximum number
$nextAvailable = ($files -replace '([a-z])(.*).txt', '$2' | measure-object -max).Maximum
}
# Add one to either the max or zero
$nextAvailable++
# Format the resulting string with leading zeros
$nextAvailableFileName = 'R{0:000000#}.txt' -f $nextAvailable
Write-Verbose "Next Available File: $nextAvailableFileName"
# rename the file
Rename-Item -Path $triggerFile -NewName $nextAvailableFileName
}