Match string with specific numbers from array - powershell

I want to create a script that searches through a directory for specific ".txt" files with the Get-ChildItem cmdlet and after that it copies the ".txt" to a location I want. The hard part for me is to extract specific .txt files string from the array. So basically I need help matching specific files names in the array. Here is an example of the array I'm getting back with the following cmdlet:
$arrayObject = (Get-ChildItem -recurse | Where-Object { $_.Name -eq "*.txt"}).Name
The arrayobject variable is something like this:
$arrayobject = "test.2.5.0.txt", "test.1.0.0.txt", "test.1.0.1.txt",
"test.0.1.0.txt", "test.0.1.1.txt", "test.txt"
I want to match my array so it returns the following:
test.2.5.0.txt, test.1.0.0.txt, test.1.0.1.txt
Can someone help me with Regex to match the above file names from the $arrayObject?

As you already add the -Recurse parameter to Get-ChildItem, you can also use the -Include parameter like this:
$findThese = "test.2.5.0.txt", "test.1.0.0.txt", "test.1.0.1.txt"
$filesFound = (Get-ChildItem -Path 'YOUR ROOTPATH HERE' -Recurse -File -Include $findThese).Name
P.S. without the -Recurse parameter you need to add \* to the end of the rootfolder path to be able to use -Include

Maybe something like:
$FileList = Get-ChildItem -path C:\TEMP -Include *.txt -Recurse
$TxtFiles = 'test1.txt', 'test3.txt', 'test9.txt'
Foreach ($txt in $TxtFiles) {
if ($FileList.name -contains $txt) {Write-Host File: $Txt is present}
}
A general rule: Filter as left as possible! Less objects to be processed, less resources to be used, faster to be processed!
Hope it helps!

Please try to clarify what the regex should match.
I created a regex which matches out of the given filenames only the files you wanted to retrieve:
"*.[1-9].[0-9].[0-9].txt"
You can tryout the small check I wrote.
ForEach($file in $arrayobject){
if($file -LIKE "*.[1-9].[0-9].[0-9].txt"){
Write-Host $file
}}
I think the "-LIKE" operator would be better to check if a string matches a regex.
Let me know if this helps.

Sorry for the late reply. Just got back in the office today. My question has been misinterpreted but that's my fault. I wasn't clear what I really want to do.
What I want to do is search through a directory and retrieve/extract in my case the (major)version of a filename. So in my case file "test.2.5.0.txt" would be version 2.5.0. After that I will get the MajorVersion and that's 2. Then in an If statement I would check if it's greater or equal to 1 and then copy it to a specific destination. To add some context to it. It's nupkg files and not txt. But I figured it out. This is code:
$sourceShare = "\\server1name\Share\txtfilesFolder"
destinationShare = "\\server2name\Share\txtfilesFolder"
Get-ChildItem -Path $sourceShare `
-Recurse `
-Include "*.txt" `
-Exclude #("*.nuspec", "*.sha512") `
| Foreach-Object {
$fileName = [System.IO.Path]::GetFileName($_)
[Int]$majorVersion = (([regex]::Match($fileName,"(\d+(.\d+){1,})" )).Value).Split(".")[0]
if ($majorVersion -ge 1)
{
Copy-Item -Path $_.FullName `
-Destination $destinationShare `
-Force
}
}
If you have anymore advice. Let me know. I would be great to extract the major version without using the .Split method
Grtz

Related

Find files with partial name match and remove desired file

I have a little over 12000 files that I need to sort through.
18-100-00000-LOD-H.pdf
18-100-00000-LOD-H-1C.pdf
21-200-21197-LOD-H.pdf
21-200-21197-LOD-H-1C.pdf
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
I need a way to go through all the files and delete the LOD-H version of the files.
EX:
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
With the partial match being the 5 digit code I need a script that would delete the LOD-H case of the partial match.
So far this is what I have but it won't work because I need to supply values for the pattern but since there isn't one set pattern and more like multiple patterns I don't know what to supply it with
$source = "\\Summerhall\GLUONPREP\Market Centers\~Pen Project\Logos\ALL Office Logos"
$destination = "C:\Users\joshh\Documents\EmptySpace"
$toDelete = "C:\Users\joshh\Documents\toDelete"
$allFiles = #(Get-ChildItem $source -File | Select-Object -ExpandProperty FullName)
foreach($file in $allFiles) {
$content = Get-Content -Path $file
if($content | Select-String -SimpleMatch -Quiet){
$dest = $destination
}
else{
$dest = $toDelete
}
}
Any help would be super appreciated, even links to something similar or even links to documentation so I can start piecing a script of my own would be super helpful.
Thank you!
This should work for what you need:
# Get a list of the files with -1C preceeding the extension
$1cFiles = #( ( Get-ChildItem -File "${source}/*-LOD-H-1C.pdf" ).Name )
# Retreive files that match the same pattern without 1C, and iterate over them
Get-ChildItem -File "${source}/*-LOD-H.pdf" | ForEach-Object {
# Get the name of the file if it had the -1C suffix preceeding the .ext
$useName = $_.Name.Insert($_.Name.LastIndexOf('.pdf'), '-1C')
# If the -1C version of the file exists, remove the current (non-1C) file
if( $1cFiles -contains $useName ) {
Remove-Item -Force $_
}
}
Basically, look for the 1C files in $toDelete, then iterate over the non-1C files in $toDelete, removing the non-1C file if adding -1C before the file extension matches an existing file with 1C in the name.

How to take a list of partial file names and return a list of the full file names with PowerShell

I’m wondering how to take a list of partial document names and return a list of the full document names with PowerShell.
I have ton of documents to do this with. We have a naming scheme: HG-xx-xx-###
The full naming scheme for the actual files is: HG-xx-xx-###.x.x_File_Name
I have a lot of different lists of file names like so:
HG-SP-HG-001
HG-WI-BE-005
HG-GD-BB-043
I'm trying to have program return a list of the full file names like so:
HG-SP-HG-001.1.6_Example
HG-WI-BE-005.1.0_Example
HG-GD-BB-043.1.1_Example
I've included both methods I've tried. I give it a list or even just one partial file name and I get nothing back.
I've tried two different ways and I'm at the end of my programming and googling capabilities, any ideas?
$myPath = 'P:\'
$_DocList = READ-HOST "Enter list of document ID's"
$_DocList = $_DocList.Split(',').Split(' ')
#Here I'm not sure if I should do it like so:
$output =
ForEach($_Doc in $_DocList)
{
$find = gci $myPath -Recurse |where{$_.name -contains $($_Doc)}
Write-Host "$find"
}
$output | clip
#or like this:
$_DocList | ForEach-Object
{
gci -Path $myPath -Filter $_ -Recurse
$info = Get-ChildItem $_.FullName | Measure-Object
if ($info.Count -ne 0) {
Write-Output "$($_.Name)"
}
} | clip
Doug Maurer's helpful answer shows a solution based on a wildcard pattern based to the -Filter parameter.
Since this parameter only accepts a single pattern, the Get-ChildItem -Recurse call must be called multiple times, in a loop.
However, since you're using -Recurse, you can take advantage of the -Include parameter, which accepts multiple patterns, so you can get away with one Get-ChildItem call.
While for a single Get-ChildItem call -Filter performs better than -Include, a single Get-ChildItem -Include call with an array of pattern is likely to outperform multiple Get-ChildItem -Filter calls, especially with many patterns.
# Sample name prefixes to search for.
$namePrefixes = 'HG-SP-HG-001', 'HG-WI-BE-005', 'HG-GD-BB-043'
# Append '*' to all prefixes to form wildcard patterns: 'HG-SP-HG-001*', ...
$namePatterns = $namePrefixes -replace '$', '*'
# Combine Get-ChildItem -Recurse with -Include and all patterns.
# .Name returns the file name part of all matching files.
$names = (Get-ChildItem $myPath -File -Recurse -Include $namePatterns).Name
Maybe something like this?
$docList = #('HG-SP-HG-*','HG-WI-BE-*','HG-GD-BB-*')
foreach($item in $docList)
{
$check = Get-ChildItem -Filter $item P:\ -File
if($check)
{
$check
}
}
Maybe something like this?
$docList = #('HG-SP-HG','HG-WI-BE','HG-GD-BB')
$docList | Get-ChildItem -File -Filter $_ -Recurse | select Name
When using the filter with partial names you'll need to specify wildcard
$names = 'HG-SP-HG','HG-WI-BE','HG-GD-BB'
$names | Foreach-Object {
Get-ChildItem -File -Filter $_* -Recurse
}
And if you only want the full path back, simply select it.
$names = 'HG-SP-HG','HG-WI-BE','HG-GD-BB'
$names | Foreach-Object {
Get-ChildItem -File -Filter $_* -Recurse
} | Select-Object -ExpandProperty FullName
If you have an established pattern of what the files look like, why not regex it?
# Use these instead to specify a docID
#$docID = "005"
#pattern = "^HG(-\w{2}){2}-$docID"
$pattern = "^HG(-\w{2}){2}-\d{3}"
Get-ChildItem -Path "P:\" -Recurse | ?{$_ -match $pattern}
Granted, there may be more efficient ways to do this, but it should be quick enough for a few thousand files.
EDIT: This is the breakdown of the regex pattern's hieroglyphics.
^ Start at the beginning
HG literal characters "HG"
(-\w{2})
( start of a grouping
- literal "-" character (hyphen)
\w{2}
\w any word character
{2} exactly 2 times
) End of the grouping
{2} exactly 2 times
- literal "-" character (hyphen)
\d any digit 0 through 9
{3} Exactly 3 times

concatenate columnar output in PowerShell

I want to use PowerShell to generate a list of commands to move files from one location to another. (I'm sure PowersSell could actually do the moving, but I'd like to see the list of commands first ... and yes I know about -WhatIf).
The files are in a series of subfolders one layer down, and need moved to a corresponding series of subfolders on another host. The subfolders have 8-digit identifiers. I need a series of commands like
move c:\certs\40139686\22_05_2018_16_23_Tyre-Calligraphy.jpg \\vcintra2012\images\40139686\Import\22_05_2018_16_23_Tyre-Calligraphy.jpg
move c:\certs\40152609\19_02_2018_11_34_Express.JPG \\vcintra2012\images\40152609\Import\19_02_2018_11_34_Express.JPG
The file needs to go into the \Import subdirectory of the corresponding 8-digit-identifier folder.
The following Powershell will generate the data that I need
dir -Directory |
Select -ExpandProperty Name |
dir -File |
Select-Object -Property Name, #{N='Parent';E={$_.Directory -replace 'C:\\certs\\', ''}}
40139686 22_05_2018_16_23_Tyre-Calligraphy.jpg
40152609 19_02_2018_11_34_Express.JPG
40152609 Express.JPG
40180489 27_11_2018_11_09_Appointment tuesday 5th.jpg
but I am stuck on how to take that data and generate the concatenated string which in PHP would look like this
move c:\certs\$Parent\$Name \\vcintra2012\images\$Parent\Import\$Name
(OK, the backslashes would likely need escaped but hopefully it is clear what I want)
I just don't know to do this sort of concatenation of columnar output - any SO refs I look at e.g.
How do I concatenate strings and variables in PowerShell?
are not about how to do this.
I think I need to pipe the output to an expression that effects the concatenation, perhaps using -join, but I don't know how to refer to $Parent and $Name on the far side of the pipe?
Pipe your output into a ForEach-Object loop where you build the command strings using the format operator (-f):
... | ForEach-Object {
'move c:\certs\{0}\{1} \\vcintra2012\images\{0}\Import\{1}' -f $_.Parent, $_.Name
}
Another approach:
$source = 'C:\certs'
$destination = '\\vcintra2012\images'
Get-ChildItem -Path $source -Depth 1 -Recurse -File | ForEach-Object {
$targetPath = [System.IO.Path]::Combine($destination, $_.Directory.Name , 'Import')
if (!(Test-Path -Path $targetPath -PathType Container)) {
New-Item -Path $targetPath -ItemType Directory | Out-Null
}
$_ | Move-Item -Destination $targetPath
}

Powershell change extension

I have a problem with change extension of a file. I need to write a script which is replicating data, but data have two files. Filename is not a string, so we can't use normal -replace
I need to get from
filename.number.extension
this form
filename.number.otherextension
We try to use a split, but this command show us things like below
filename
number
otherextension
Thanks for any ideas,
[System.IO.Path]::ChangeExtension("test.old",".new")
You probably want something like the -replace operator:
'filename.number.extension' -replace 'extension$','otherextension'
The $ is regular expression syntax meaning end of line. This should ensure that the -replace does not match "extension" appearing elsewhere in the filename.
A simple Utility Function
<#
# Renames all files under the given path (recursively) whose extension matches $OldExtension.
# Changes the extension to $NewExtension
#>
function ChangeFileExtensions([string] $Path, [string] $OldExtension, [string] $NewExtension) {
Get-ChildItem -Path $Path -Filter "*.$OldExtension" -Recurse | ForEach-Object {
$Destination = Join-Path -Path $_.Directory.FullName -ChildPath $_.Name.Replace($OldExtension, $NewExtension)
Move-Item -Path $_.FullName -Destination $Destination -Force
}
}
Usage
ChangeFileExtensions -Path "c:\myfolder\mysubfolder" -OldExtension "extension" -NewExtension "otherextension"
But it can do more than just this. If you had the following files in the same folder as your script
example.sample.csv
example.txt
mysubfolder/
myfile.sample.csv
myfile.txt
this script would rename all the .sample.csv files to .txt files in the given folder and all subfolders and overwrite any existing files with those names.
# Replaces all .sample.csv files with .txt extensions in c:\myfolder and in c:\myfolder\mysubfolder
ChangeFileExtensions -Path "c:\myfolder" -OldExtension "sample.csv" -NewExtension "txt"
If you don't want it to be recursive (affecting subfolders) just change
"*.$OldExtension" -Recurse | ForEach-Object
to
"*.$OldExtension" | ForEach-Object
This could work:
Get-ChildItem 'C:\Users\Administrator\Downloads\text files\more text\*' *.txt | rename-item -newname { [io.path]::ChangeExtension($_.name, "doc") }
You can remove the last item with the the [0..-1] slice and add the new extension to that
(("filename.number.extension" -split "\.")[0..-1] -join '.') +".otherextension"

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory