Removing part of a file name before copy using PowerShell - powershell

I need to rename files during the copy process and strip out part of the file name. What I have been doing before users added to the file name was simple:
dir $PROCDIR\$PDFTYPE\holding_pattern\*.pdf -recurse | sort -property lastwritetime | select -first 1 | move-item -destination $PROCDIR\$PDFTYPE\begin_processing
The file name format that I am working with is now xxx_xxx_xxx_xxx_xxx.pdf where the _ splits the information apart. The x's are just an example because the file could be named LakeTahoe_February15_Airplane_0115201457_baseball.pdf. When I perform the copy I need to keep the first three.... so from aaa_aaa_aaa_aaa_aaa.pdf to aaa_aaa_aaa.pdf. Basically stripping out the last two. Further if there is nothing beyond LakeTahoe_February15_Airplane_.pdf I want to get rid of the last "_" as well.
I am still very new with powershell but learning. It is good stuff however frustrates me from time to time :). Ideas?
Thanks!

Here is a regex solution that might help you out:
dir $PROCDIR\$PDFTYPE\holding_pattern\*.pdf | sort -property lastwritetime | select -first 1 | % { $_.Name -match '.*?_.*?_.*?(?=_)'; $Target = '$PROCDIR\$PDFTYPE\begin_processing\{0}.pdf' -f $matches[0]; Move-Item -Path $_.FullName -Destination $Target -WhatIf; };
The results I got during my test seem to indicate that the move/rename operation was successful:
What if: Performing the operation "Move File" on target "Item: C:\test\asdf_blah_asdf_qwer_trew_ytui - Copy.pdf Destination: C:\Windows\System32\WindowsPowerShell\v1.0\$PROCDIR\$PDFTYPE\begin_processing\asdf_blah_a
sdf.pdf".
You can safely ignore the phony destination path in my example, since I don't have the $ProcDir and $PDFType variables defined.
Here's a version that's a bit more readable, on multiple lines.
Get-ChildItem -Path c:\test\*.pdf |
Sort-Object -Property lastwritetime | Select-Object -First 1 |
ForEach-Object -Process { $_.Name -match '.*?_.*?_.*?(?=_)'; $Target = 'c:\test\subtest\{0}.pdf' -f $matches[0]; Move-Item -Path $_.FullName -Destination $Target -WhatIf; };
Result:
What if: Performing the operation "Move File" on target "Item: C:\test\asdf_blah_asdf_qwer_trew_ytui - Copy.pdf Destination: C:\test\subtest\asdf_blah_asdf.pdf".

Another way to do it, inline and more succinct (this is the last stage of your pipleine; everything that precedes it stays the same):
| mi -Des $PROCDIR\$PDFTYPE\begin_processing\$(($_.Name -split '_')[0..2] -join '_').pdf
(mi -Des is the same as Move-Item -Destination; I used the short version to fit it into one line without scroll bars.)
What this does is splits the base filename into an array of the underscore-separated parts, selects the first three elements of the array (i.e., the first three parts of the filename), glues them back together with underscores, and tacks the .pdf extension back on.
Another approach is to use the -replace operator to match the part you want to get rid of and replace it with an empty string:
| mi -Des $PROCDIR\$PDFTYPE\begin_processing\$($_.Name -replace '(_[^_]+){2}(?=\.)', '')

Related

Find all files with a particular extension, sorting them by creation time and copy them with a new name

I am attempting to find recursively all files with the extension .raw and then sort them in ascending order of CreationTime. After that, I would like to copy each file to a new directory where the names are IMG_001_0001.jpg ... IMG_001_0099.jpg where I am using 4 digits in ascending order. It is important that the file name IMG_001_0001.jpg is the first one created and if there are 99 files, IMG_001_0099.jpg is the last file created.
I tried this:
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -include *.raw | Sort-Object CreationTime | ForEach-Object {copy $_.FullName F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_$($_.ReadCount).jpg}
If I understand correctly you could do it like this:
$count = #{ Value = 0 }
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -Filter *.raw |
Sort-Object CreationTime | Copy-Item -Destination {
'F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_{0:D4}.jpg' -f
$count['Value']++
}
Using D4 for the format string ensures your integers would be represented with 4 digits. See Custom numeric format strings for details.
As you can note, instead of using ForEach-Object to enumerate each file, this uses a delay-bind script block to generate the new names for the destination files and each source object is bound from pipeline.
Worth noting that the forward slashes in /test/ might bring problems and likely should be changed to backslashes: \test\.
You don't need a hashtable to iterate... just use [int].
For the sake of clarity please don't use paths here that can easily be mistaken for a file rather than a directory name.
Get-Childitem does not work on files and if it does it's not portable.
Also that script block for -Destination is not likely to work as parameters defined outside it are not available inside. Nor is there any need to delay anything.
Something like this should be perfectly sufficient:
$ziproot ='F:\input_folder'
$count = 0
$candidates = Get-ChildItem -Recurse -Filter '*.raw' |
Sort-Object CreationTime
ForEach($file in $candidates)
{
copy-item -source $_.FullName -Destination ('{0}/test/IMG_001_{1:D4}{2}' -f $ziproot,++$count, $_.Extension )
}
(Try using foreach($var in $list) {commands} where you can, it's faster than foreach-object by about a factor of 10.)

Powershell: How to batch rename files in chronological order and sorted by Date

I have more than 1000 photos that I want to rename into "ngimg-" therefore should yield "ngimg-1 , ngimg-2, ngimg-3..." and on and on until the last photo.
I would like to use PowerShell for this task instead of installing third party applications.
I've tried this code here but it's not the proper one for this task
Dir *.jpg | sort Date | ForEach-Object -begin { $count=1 } -process { rename-item $_ -NewName "image$count.jpg"; $count++ }
I'm not sure if that code above is right. I just made it up from this:
Dir *.jpg | ForEach-Object -begin { $count=1 } -process { rename-item $_ -NewName "image$count.jpg"; $count++ }
I would appreciate any help. Thank you
Update: To add to my question, I want to have the files sorted by date first and then be renamed, therefore, it should be like this:
01/01/2020 = ngimg-1
01/02/2020 = ngimg-2
01/03/2020 = ngimg-3
...and so on
I would format the count number with 4 digits, since you have (more than) 1000 images, so they will sort correctly after renaming them.
Something like this:
$count = 1
Get-ChildItem -Path 'D:\Test' -Filter '*.jpg' -File |
Sort-Object LastWriteTime |
Rename-Item -NewName { 'ngimg-{0:D4}{1}' -f $script:count++, $_.Extension } -WhatIf
Take off the -WhatIf safety switch if you have seen the code would have renamed all files as you like. With tha switch on, no file is actually renamed.
The $script:count++ is needed here because otherwise, the scriptblock for Rename-Item does not know the $count variable and then the index number will not be incremented on each file.
Of course, change the path 'D:\Test' into the folder path where your image files are..
To demonstrate here two screenshots.
Before:
As you can see in the second column (Date) the files are now shown sorted by Name, so the file dates are not in chronoligical order.
After:
After renaming, the dates ARE in chronological order while still we show the list sorted by Name.
As suggested by #js2010 I think you need to make sure your Sort operation finishes before pipelining it to the rename. This is easily done with parenthesis like below.
$count = 1
(Get-ChildItem -Path 'D:\Test' -Filter '*.jpg' -File | Sort CreationTime) | Rename-Item -NewName { 'ngimg-{0:D4}{1}' -f $script:count++, $_.Extension } -WhatIf
With so many pictures it might be trying to batch the execution. The parenthesis makes sure it finishes sorting before it starts renaming.
That should work, more or less. I would put parentheses around the first part so it finishes first (the sorting probably does the same thing). I assume you want to sort by lastwritetime, since you don't state it in the question.
dir | select name,LastWriteTime
Name LastWriteTime
---- -------------
file1.jpg 7/1/2020 10:19:26 AM
file2.jpg 7/1/2020 10:19:30 AM
file3.jpg 7/1/2020 10:19:33 AM
(Dir *.jpg) | sort LastWriteTime | ForEach { $count=1 } { rename-item $_ -NewName ngimg-$count.jpg -whatif; $count++ }
What if: Performing the operation "Rename File" on target "Item: C:\Users\js\foo\file1.jpg Destination: C:\Users\js\foo\ngimg-1.jpg".
What if: Performing the operation "Rename File" on target "Item: C:\Users\js\foo\file2.jpg Destination: C:\Users\js\foo\ngimg-2.jpg".
What if: Performing the operation "Rename File" on target "Item: C:\Users\js\foo\file3.jpg Destination: C:\Users\js\foo\ngimg-3.jpg".

concatenate columnar output in PowerShell

I want to use PowerShell to generate a list of commands to move files from one location to another. (I'm sure PowersSell could actually do the moving, but I'd like to see the list of commands first ... and yes I know about -WhatIf).
The files are in a series of subfolders one layer down, and need moved to a corresponding series of subfolders on another host. The subfolders have 8-digit identifiers. I need a series of commands like
move c:\certs\40139686\22_05_2018_16_23_Tyre-Calligraphy.jpg \\vcintra2012\images\40139686\Import\22_05_2018_16_23_Tyre-Calligraphy.jpg
move c:\certs\40152609\19_02_2018_11_34_Express.JPG \\vcintra2012\images\40152609\Import\19_02_2018_11_34_Express.JPG
The file needs to go into the \Import subdirectory of the corresponding 8-digit-identifier folder.
The following Powershell will generate the data that I need
dir -Directory |
Select -ExpandProperty Name |
dir -File |
Select-Object -Property Name, #{N='Parent';E={$_.Directory -replace 'C:\\certs\\', ''}}
40139686 22_05_2018_16_23_Tyre-Calligraphy.jpg
40152609 19_02_2018_11_34_Express.JPG
40152609 Express.JPG
40180489 27_11_2018_11_09_Appointment tuesday 5th.jpg
but I am stuck on how to take that data and generate the concatenated string which in PHP would look like this
move c:\certs\$Parent\$Name \\vcintra2012\images\$Parent\Import\$Name
(OK, the backslashes would likely need escaped but hopefully it is clear what I want)
I just don't know to do this sort of concatenation of columnar output - any SO refs I look at e.g.
How do I concatenate strings and variables in PowerShell?
are not about how to do this.
I think I need to pipe the output to an expression that effects the concatenation, perhaps using -join, but I don't know how to refer to $Parent and $Name on the far side of the pipe?
Pipe your output into a ForEach-Object loop where you build the command strings using the format operator (-f):
... | ForEach-Object {
'move c:\certs\{0}\{1} \\vcintra2012\images\{0}\Import\{1}' -f $_.Parent, $_.Name
}
Another approach:
$source = 'C:\certs'
$destination = '\\vcintra2012\images'
Get-ChildItem -Path $source -Depth 1 -Recurse -File | ForEach-Object {
$targetPath = [System.IO.Path]::Combine($destination, $_.Directory.Name , 'Import')
if (!(Test-Path -Path $targetPath -PathType Container)) {
New-Item -Path $targetPath -ItemType Directory | Out-Null
}
$_ | Move-Item -Destination $targetPath
}

Powershell add suffix to filenames, based on prefix

I have a directory that consists of a number of text files that have been named:
1Customer.txt
2Customer.txt
...
99Customer.txt
I am trying to create powershell script that will rename the files to a more logical:
Customer1.txt
Customer2.txt
...
Customer99.txt
The prefix can be anything from 1 digit to 3 digits.
As I am new to powershell, I really don't know how I can achieve this. Any help much appreciated.
The most straigth forward way is a gci/ls/dir
with a where matching only BaseNames starting with a number with a
RegEx and piping to
Rename-Item and building the new name from submatches.
ls |? BaseName -match '^(\d+)([^0-9].*)$' |ren -new {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
The same code without aliases
Get-ChildItem |Where-Obect {$_.BaseName -match '^(\d+)([^0-9].*)$'} |
Rename-Item -NewName {"{0}{1}{2}" -f $matches[2],$matches[1],$_.extension}
Here is one way to do it:
Get-ChildItem .\Docs -File |
ForEach-Object {
if($_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
{
Rename-Item -Path $_.FullName -NewName "$($matches.Type)$($matches.Number)$($_.Extension)"
}
}
The line:
$_.Name -match "^(?<Number>\d+)(?<Type>\w+)\.\w+$")
takes the file name (e.g. '23Suppliers.txt') and perform a pattern match on it, pulling out the number part (23) and the 'type' part ('Suppliers'), naming them 'Number' and 'Type' respectively. These are stored by PowerShell in its automatic variable $matches, which is used when working with regular expressions.
We then reconstruct the new file using details from the original file, such as the file's extension ($_.Extension) and the matched type ($matches.Type) and number ($matches.Number):
"$($matches.Type)$($matches.Number)$($_.Extension)"
I'm sure there's a nicer way to do this with regex, but the following is a quick first go at it:
$prefix = "Customer"
Get-ChildItem C:\folder\*$prefix.txt | Rename-Item -NewName {$prefix + ($_.Name -replace $prefix,'')}

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory