Rename and Increment Integer File Names - powershell

I have a bunch of files with integers that range from 30.pdf to 133.pdf as filenames. What I am trying to do is increment each filename, so 30.pdf should become 31.pdf, ..., and 133.pdf should become 134.pdf.
Does anybody know how I can achieve this?
I know I can loop through the directory with foreach ($f in dir) or display and even sort the filenames with get-childitem | sort-object, but this latter method obviously has issues sorting numerically.
No idea why something so simple is so difficult to figure out. Cannot find this anywhere online...

This should work assuming the BaseName of the files contains only digits as you have shown on your question. First you would need to sort the files descending to avoid collision and after that you can pipe them to Rename-Item using the -NewName script block:
Get-ChildItem path/to/files -File | Sort-Object { [int]$_.BaseName } -Descending |
Rename-Item -NewName {
[int]$n = $_.BaseName; '{0}{1}' -f (++$n), $_.Extension
}

We can take some shortcuts to make this easier since your file names are literally just integers. Start by using this statement to get a collection of integers that represents your files ($files is the output of Get-ChildItems from the parent directory):
$integersFromNames = $files | Select -ExpandProperty BaseName | foreach { $_ -as [int] } | sort ‐Descending
Now you have all of the existing files sorted from largest to smallest. You can use Rename-Item in a foreach loop to get you the rest of the way:
foreach ($oldNumber in $integersFromNames) {
$newNumber = $oldNumber + 1
Rename-Item -Path "$oldNumber.pdf" -NewName "$newNumber.pdf"
}
Please excuse any typos. I'm on my phone. Hopefully the concept comes across clearly.

Related

Find all files with a particular extension, sorting them by creation time and copy them with a new name

I am attempting to find recursively all files with the extension .raw and then sort them in ascending order of CreationTime. After that, I would like to copy each file to a new directory where the names are IMG_001_0001.jpg ... IMG_001_0099.jpg where I am using 4 digits in ascending order. It is important that the file name IMG_001_0001.jpg is the first one created and if there are 99 files, IMG_001_0099.jpg is the last file created.
I tried this:
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -include *.raw | Sort-Object CreationTime | ForEach-Object {copy $_.FullName F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_$($_.ReadCount).jpg}
If I understand correctly you could do it like this:
$count = #{ Value = 0 }
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -Filter *.raw |
Sort-Object CreationTime | Copy-Item -Destination {
'F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_{0:D4}.jpg' -f
$count['Value']++
}
Using D4 for the format string ensures your integers would be represented with 4 digits. See Custom numeric format strings for details.
As you can note, instead of using ForEach-Object to enumerate each file, this uses a delay-bind script block to generate the new names for the destination files and each source object is bound from pipeline.
Worth noting that the forward slashes in /test/ might bring problems and likely should be changed to backslashes: \test\.
You don't need a hashtable to iterate... just use [int].
For the sake of clarity please don't use paths here that can easily be mistaken for a file rather than a directory name.
Get-Childitem does not work on files and if it does it's not portable.
Also that script block for -Destination is not likely to work as parameters defined outside it are not available inside. Nor is there any need to delay anything.
Something like this should be perfectly sufficient:
$ziproot ='F:\input_folder'
$count = 0
$candidates = Get-ChildItem -Recurse -Filter '*.raw' |
Sort-Object CreationTime
ForEach($file in $candidates)
{
copy-item -source $_.FullName -Destination ('{0}/test/IMG_001_{1:D4}{2}' -f $ziproot,++$count, $_.Extension )
}
(Try using foreach($var in $list) {commands} where you can, it's faster than foreach-object by about a factor of 10.)

Using Variables with Directories & Filtering

I'm new to PowerShell, and trying to do something pretty simple (I think). I'm trying to filter down the results of a folder, where I only look at files that start with e02. I tried creating a variable for my folder path, and a variable for the filtered down version. When I get-ChildItem for that filtered down version, it brings back all results. I'm trying to run a loop where I'd rename these files.
File names will be something like e021234, e021235, e021236, I get new files every month with a weird extension I convert to txt. They're always the same couple names, and each file has its own name I'd rename it to. Like e021234 might be Program Alpha.
set-location "C:\MYPATH\SAMPLE\"
$dir = "C:\MYPATH\SAMPLE\"
$dirFiltered= get-childItem $dir | where-Object { $_.baseName -like "e02*" }
get-childItem $dirFiltered |
Foreach-Object {
$name = if ($_.BaseName -eq "e024") {"Four"}
elseif ($_.BaseName -eq "e023") {"Three"}
get-childitem $dirFiltered | rename-item -newname { $name + ".txt"}
}
There are a few things I can see that could use some adjustment.
My first thought on this is to reduce the number of places a script has to be edited when changes are needed. I suggest assigning the working directory variable first.
Next, reduce the number of times information is pulled. The Get-ChildItem cmdlet offers an integrated -Filter parameter which is usually more efficient than gathering all the results and filtering afterward. Since we can grab the filtered list right off the bat, the results can be piped directly to the ForEach block without going through the variable assignment and secondary filtering.
Then, make sure to initialize $name inside the loop so it doesn't accidentally cause issues. This is because $name remains set to the last value it matched in the if/elseif statements after the script runs.
Next, make use of the fact that $name is null so that files that don't match your criteria won't be renamed to ".txt".
Finally, perform the rename operation using the $_ automatic variable representing the current object instead of pulling the information with Get-ChildItem again. The curly braces have also been replaced with parenthesis because of the change in the Rename-Item syntax.
Updated script:
$dir = "C:\MYPATH\SAMPLE\"
Set-Location $dir
Get-ChildItem $dir -Filter "e02*" |
Foreach-Object {
$name = $null #initialize name to prevent interference from previous runs
$name = if ($_.BaseName -eq "e024") {"Four"}
elseif ($_.BaseName -eq "e023") {"Three"}
if ($name -ne $null) {
Rename-Item $_ -NewName ($name + ".txt")
}
}

PowerShell script to write out recursive list of files without the root directory

Is there an easy way of stripping the root directory when retrieving a recursive list of files in a directory?
e.g.
Get-ChildItem -Recurse "C:\Test\Folder" | Where { ! $_.PSIsContainer } | Select FullName | Out-File "C:\Test\Folder\items.txt"
But instead of this giving me:
C:\Test\Folder\shiv.png
C:\Test\Folder\Another\kendall.png
C:\Test\Folder\YetAnother\roman.png
It gives me:
Folder\shiv.png
Folder\Another\kendall.png
Folder\YetAnother\roman.png
Is this possible in one line or do I have to write a loop?
Also is there an output method that doesn't create gaps between lines or also spit out the name of the select property at the top? (In this case 'FullName'), or would I just have to loop manually and ignore them?
The problem you're having with the "gap" is because of Powershell's formatting when displaying objects. If you want to get the plain values, you could use the -ExpandProperty switch:
select -Expand FullName
As for your problem though, you could do something like this:
$dir = "C:\Test\Folder"
Get-ChildItem $dir -Recurse -File | foreach {
$_.FullName.Replace("$dir\", "")
} | Out-File "C:\Test\Folder\items.txt"
Note that this would be case sensitive. If that's an issue, you could use regex:
$dir = "d:\tmp"
$pattern = [Regex]::Escape($dir) + "\\?"
Get-ChildItem $dir -Recurse -File | foreach {
$_.FullName -replace $pattern, ""
}
This does what you ask:
Get-ChildItem -Recurse "C:\Test\Folder" | Where { ! $_.PSIsContainer } | Select -ExpandProperty FullName | % { $_.Split("\")[2..1000] -Join "\" }
Output:
Folder\shiv.png
Folder\Another\kendall.png
Folder\YetAnother\roman.png
Explanation:
The result of Get-ChildItem is split on the \ character which returns an array. The next bit [2..1000] slices the array from element 2 (and the rest of the array up until the specified number, so just take a sufficiently large number). Finally we join the array into a new string with the -Join operator.

Renaming pdf files in sequential order using powershell

I have a large number of pdf's that need renamed in sequential order. These were originally scanned into a single document, then extracted as separate files. When extracted, the name becomes "444026-444050 1", "444026-444050 2", etc. I am trying to rename all the files to match the document number ("444026-444050 1" would become "444026").
I found the following line of code that I can use in Powershell, but it seems that anything over 9 files there is a problem! Once I try it with 10 files, only the first file is saved correctly. The rest become jumbled (file 444027 has the contents of file 444035, then file 444028 has 444027, and 444029 has 444028, etc.)
I imagine there is some sort of problem with a loop, but am having difficulty fixing it.
Can someone help?
thanks
Dir *.pdf | ForEach-Object -begin { $count=26 } -process { rename-item $_ -NewName "4440$count.pdf"; $count++ }
Alright. Let's see if this makes everybody happy. Maybe you should try this with a backup copy of the files.
# make some test files in a new folder
# 1..20 | foreach {
# if (! (test-path "44026-44050 $_.pdf")) {
# echo "44026-44050 $_" > "44026-44050 $_.pdf" }
# }
# rename and pad the first 9 old filenames with a 0 before the last digit for sorting
# is it less than 100 files?
1..9 | foreach {
ren "44026-44050 $_.pdf" "44026-44050 0$_.pdf" -whatif
}
# make dir complete first with parentheses for powershell 5
# pad $count to 2 digits
# take off the -whatif if it looks ok
(dir *.pdf) | foreach { $count = 1 } {
$padded = $count | foreach tostring 00
rename-item $_ -newname 4440$padded.pdf -whatif; $count++ }
The order in which Dir (which is an alias for Get-ChildItem) retrieves the items does not appear to be strictly guaranteed. Furthermore, if it is sorting it's probably sorting them as strings and "444026-444050 10" comes before "444026-444050 2" as strings. It might be worth inserting SortObject into your pipeline and using Split to get at the sequence number you care about:
Dir *.pdf | Sort-Object -Property {[int]$_.Name.Split()[1].Split(".")[0]} | ForEach-Object -begin { $count=26 } -process { rename-item $_ -NewName "4440$count.pdf"; $count++ }
The key part is this new pipeline stage inserted after Dir and before ForEach-Object:
Sort-Object -Property {[int]$_.Name.Split()[1].Split(".")[0]}
This says to sort the output of Dir according to the whatever comes between the first space and the subsequent period, comparing those things as integers (not strings). This ensures that your results will be ordered and that you'll get them in numeric order, not lexicographic order.

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory