Searching for files in powershell - powershell

I've scoured all of the internet for this answer. Maybe it's right here, but alas, I'm out of time and we're on a time schedule from the wonderful boys over in legal.
We have some files which need to be retrieved based on particular names which appear in the directory path.
The person who stored and saved all of these files kept the same naming convention throughout. She's pretty awesome and a++ to her.
The file structure is as below:
Animals
-Dogs
-Folders With Breeds of Dogs
-<Breed of Dog>_MA_etc.pdf
-Cats
-Folders with Breeds of Cats
-<Breed of Cat>_MA_etc.pdf
-ETC
-etc
-etc
The person who saved the files was meticulous about file structure and naming convention, so you can expect c:\animals\dogs\GSD\GSD_MA.PDF or something like that.
While the original author was rather consistent, human error has occured so what I'm trying to do is look for "close enough", basically.
We might have:
Client Agreements\Netflix\files
Master Agreements\Netflix,Inc\files
Rental Agreements\Netflix\files
What I want to do is grab the file structure of all of those and move them to my "E:\sorted" directory maintaining the file structure it has.
So stepping way from animals, we've got a client list from legal with names they're interested. If I look for name:name, i get 27 results. So far not good.
I've tried partial and I get zero results. So here's my terrible code below. Maybe you can make fun of me and show me where I went wrong.
$a = Import-CSV C:\scripts\Clients.csv
$a = #($a.Client)
#$a = $a | %{ $_.SubString(0,6) }
$c = Get-ChildItem E:\Legal\ -include ($a) -recurse # | Where-Object {($_ -match $a)}
ForEach($file in $c){
$dest = Split-Path -path $file.FullName -Parent | Split-path -NoQualifier
#Copy-Item -path $file -recurse -Destination "e:\sorted\11\$dest" -force -Verbose
}

I expect that there is a more PowerShell-ish way to do it, but I used a more procedural-type approach.
Using a HashSet, I create a set of directories which need to be copied. A HashSet has only one of an entry, so if it contains "C:\A\B", then adding "C:\A\B" again will not add another entry.
The .contains method is the .NET one, not the PS one, and similarly for .replace.
$src = "C:\temp\a"
$dest = "F:\temp\b"
$CsvFile = Join-Path -Path $src -ChildPath "findthese.csv"
$sought = (Import-Csv $CsvFile).Client
$dirs = Get-ChildItem -Path $src -Directory -Recurse
$set = New-Object System.Collections.Generic.HashSet[string]
# get the directories with a client name in the path anywhere
foreach($dir in $dirs) {
foreach($client in $sought) {
if ($dir.FullName.contains($client)) {
$temp = $set.Add($dir.FullName)
}
}
}
# copy the selected directory structures to the destination
foreach($dir in $set) {
Copy-Item -path $dir -Destination $dir.replace($src, $dest) -Recurse -WhatIf
}
I left the -WhatIf in there so you can quickly check it's going to do the right thing.

If the names in $a don't exactly match the names of files, using that as input to the include parameter won't help you find just those files you want.
I've got a file named clients.csv with the follwong
client,gender,fun
fred,m,y
barney,m,y
wilma,f,y
navneet,n,y
kumar,f,y
konda,m,y
In my current directory, I've got a directory named clients with the following contents:
C:
├───clients
├───losers
│ barney_loser.txt
│ kumar_loser.txt
│
└───winners
fred_winner.txt
konda_winner.txt
wilma_winner.txt
Case-1:
ls .\clients\ -Filter *.txt -Recurse
Returns all the text files.
Case-2:
$people = import-csv -path .\people.csv
$clients = $people.client
ls .\clients\ -Filter *.txt -Recurse -Include $clients
Returns me nothing.
Case-3:
$people = import-csv -path .\people.csv
$clients = $people.client
$clients += 'kumar_loser.txt'
ls .\clients\ -Filter *.txt -Recurse -Include $clients
Returns me one record for "kumar_loser.txt".
I'm asserting the pattern in your list ($a) don't match the file names.
If I wanted to fix that in my example, I could do something like this...
$people = import-csv -path .\people.csv
$clients = $people.client
for($i = 0; $i -lt $clients.length; $i++) {
$clients[$i] = '*{0}*' -f $clients[$i]
}
ls .\clients\ -Filter *.txt -Recurse -Include $clients
Hope this helps.

Thanks for the help guys.
I took a less scripty approach and procedural approach, as suggested above. Here's the code I used that mostly worked, a colleague and I went through and verified the results and some outlier files. I had to double check the errors that popped up and found a few more files that I wanted. Wasn't perfect, but definitely cut down looking through 700 folders and 3000 files. Include is great but filter is what I really wanted. Furthermore, Include doesn't like index values and Filter Especially doesn't, so I had to save it to a variable and filter by that with a * wildcard which did work.
Here's what I did:
$people = Import-CSV C:\scripts\HelenClients.csv
$clients = $people.Client| %{$_.SubString(0,5)}
for($i=0; $i -lt $clients.Length; $i++){
$name = $clients[$i]
Write-Host "Searching for $name"
$file = Get-ChildItem 'E:\Legal\' -Include "$name*" -recurse
if($file -ne $null){
$dest = Split-Path -path $file -Parent
$dest1 = $dest | Split-Path -NoQualifier
$from = $dest[0]
$to = $dest1[0]
$too = $file.BaseName[0]
Copy-Item $file -Destination e:\sorted\16\$to\$too\ -force -Verbose
}
else{
Write-Output "No results found"
}
}
I found when you store the results into a variable, if there's more than one, it'll list all of the locations and names, etc. Not pretty. See below:
PS C:\Users\me> $ff
Directory: E:\ParentDir\subfolder\redacted
Mode LastWriteTime Length Name
---- ------------- ------ ----
-a---- 6/4/2018 1:47 PM 50485 redacted.docx
-a---- 6/4/2018 1:47 PM 155579 redacted.pdf
PS C:\Users\me> $ff.Basename
Redacted Basename 0
Redacted Basename 1
PS C:\Users\me> $ff.BaseName[0]
Redacted Basename 0
So I just wanted the first indexed value. I also wanted to maintain the file structure without copying everything over, so I used split-path to kind of take it apart. It's very hodgepodge and not pretty to look at, but it works.

Related

Powershell dropping characters while creating folder names

I am having a strange problem in Powershell (Version 2021.8.0) while creating folders and naming them. I start with a number of individual ebook files in a folder that I set using Set-Location. I use the file name minus the extension to create a new folder with the same name as the e-book file. The code works fine the majority of the time with various file extensions I have stored in an array beginning of the code.
What's happening is that the code creates the proper folder name the majority of the time and moves the source file into the folder after it's created.
The problem is, if the last letter of the source file name, on files with the extension ".epub" end in an "e", then the "e" is missing from the end of the created folder name. I thought that I saw it also drop "r" and "p" but I have been unable to replicate that error recently.
Below is my code. It is set up to run against file extensions for e-books and audiobooks. Please ignore the error messages that are being generated when files of a specific type don't exist in the working folder. I am just using the array for testing and it will be filled automatically later by reading the folder contents.
This Code Creates a Folder for Each File and moves the file into that Folder:
Clear-Host
$SourceFileFolder = 'N:\- Books\- - BMS\- Books Needing Folders'
Set-Location $SourceFileFolder
$MyArray = ( "*.azw3", "*.cbz", "*.doc", "*.docx", "*.djvu", "*.epub", "*.mobi", "*.mp3", "*.pdf", "*.txt" )
Foreach ($FileExtension in $MyArray) {
Get-ChildItem -Include $FileExtension -Name -Recurse | Sort-Object | ForEach-Object { $SourceFileName = $_
$NewDirectoryName = $SourceFileName.TrimEnd($FileExtension)
New-Item -Name $NewDirectoryName -ItemType "directory"
$OriginalFileName = Join-Path -Path $SourceFileFolder -ChildPath $SourceFileName
$DestinationFilename = Join-Path -Path $NewDirectoryName -ChildPath $SourceFileName
$DestinationFilename = Join-Path -Path $SourceFileFolder -ChildPath $DestinationFilename
Move-Item $OriginalFileName -Destination $DestinationFilename
}
}
Thanks for any help you can give. Driving me nuts and I am pretty sure it's something that I am doing wrong, like always.
String.TrimEnd()
Removes all the trailing occurrences of a set of characters specified in an array from the current string.
TrimEnd method will remove all characters that matches in the character array you provided. It does not look for whether or not .epub is at the end of the string, but rather it trims out any of the characters in the argument supplied from the end of the string. In your case, all dots,e,p,u,b will be removed from the end until no more of these characters are within the string. Now, you will eventually (and you do) remove more than what you intended for.
I'd suggest using EndsWith to match your extensions and performing a substring selection instead, as below. If you deal only with single extension (eg: not with .tar.gz or other double extensions type), you can also use the .net [System.IO.Path]::GetFileNameWithoutExtension($MyFileName) method.
$MyFileName = "Teste.epub"
$FileExt = '.epub'
# Wrong approach
$output = $MyFileName.TrimEnd($FileExt)
write-host $output -ForegroundColor Yellow
#Output returns Test
# Proper method
if ($MyFileName.EndsWith($FileExt)) {
$output = $MyFileName.Substring(0,$MyFileName.Length - $FileExt.Length)
Write-Host $output -ForegroundColor Cyan
}
# Returns Tested
#Alternative method. Won't work if you want to trim out double extensions (eg. tar.gz)
if ($MyFileName.EndsWith($FileExt)) {
$Output = [System.IO.Path]::GetFileNameWithoutExtension($MyFileName)
Write-Host $output -ForegroundColor Cyan
}
You're making this too hard on yourself. Use the .BaseName to get the filename without extension.
Your code simplified:
$SourceFileFolder = 'N:\- Books\- - BMS\- Books Needing Folders'
$MyArray = "*.azw3", "*.cbz", "*.doc", "*.docx", "*.djvu", "*.epub", "*.mobi", "*.mp3", "*.pdf", "*.txt"
(Get-ChildItem -Path $SourceFileFolder -Include $MyArray -File -Recurse) | Sort-Object Name | ForEach-Object {
# BaseName is the filename without extension
$NewDirectory = Join-Path -Path $SourceFileFolder -ChildPath $_.BaseName
$null = New-Item -Path $NewDirectory -ItemType Directory -Force
$_ | Move-Item -Destination $NewDirectory
}

Find files with partial name match and remove desired file

I have a little over 12000 files that I need to sort through.
18-100-00000-LOD-H.pdf
18-100-00000-LOD-H-1C.pdf
21-200-21197-LOD-H.pdf
21-200-21197-LOD-H-1C.pdf
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
I need a way to go through all the files and delete the LOD-H version of the files.
EX:
21-200-21198-LOD-H.pdf
21-200-21198-LOD-H-1C.pdf
With the partial match being the 5 digit code I need a script that would delete the LOD-H case of the partial match.
So far this is what I have but it won't work because I need to supply values for the pattern but since there isn't one set pattern and more like multiple patterns I don't know what to supply it with
$source = "\\Summerhall\GLUONPREP\Market Centers\~Pen Project\Logos\ALL Office Logos"
$destination = "C:\Users\joshh\Documents\EmptySpace"
$toDelete = "C:\Users\joshh\Documents\toDelete"
$allFiles = #(Get-ChildItem $source -File | Select-Object -ExpandProperty FullName)
foreach($file in $allFiles) {
$content = Get-Content -Path $file
if($content | Select-String -SimpleMatch -Quiet){
$dest = $destination
}
else{
$dest = $toDelete
}
}
Any help would be super appreciated, even links to something similar or even links to documentation so I can start piecing a script of my own would be super helpful.
Thank you!
This should work for what you need:
# Get a list of the files with -1C preceeding the extension
$1cFiles = #( ( Get-ChildItem -File "${source}/*-LOD-H-1C.pdf" ).Name )
# Retreive files that match the same pattern without 1C, and iterate over them
Get-ChildItem -File "${source}/*-LOD-H.pdf" | ForEach-Object {
# Get the name of the file if it had the -1C suffix preceeding the .ext
$useName = $_.Name.Insert($_.Name.LastIndexOf('.pdf'), '-1C')
# If the -1C version of the file exists, remove the current (non-1C) file
if( $1cFiles -contains $useName ) {
Remove-Item -Force $_
}
}
Basically, look for the 1C files in $toDelete, then iterate over the non-1C files in $toDelete, removing the non-1C file if adding -1C before the file extension matches an existing file with 1C in the name.

Compress File per file, same name

I hope you are all safe in this time of COVID-19.
I'm trying to generate a script that goes to the directory and compresses each file to .zip with the same name as the file, for example:
sample.txt -> sample.zip
sample2.txt -> sample2.zip
but I'm having difficulties, I'm not that used to powershell, I'm learning and improving this script. In the end it will be a script that deletes files older than X days, compresses files and makes them upload in ftp .. the part of excluding with more than X I've already managed it for days, now I grabbed a little bit on this one.
Last try at moment.
param
(
#Future accept input
[string] $InputFolder,
[string] $OutputFolder
)
#test folder
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
$Name2 = Get-ChildItem $InputFolder -Filter '*.csv'| select Name
Set-Variable SET_SIZE -option Constant -value 1
$i = 0
$zipSet = 0
Get-ChildItem $InputFolder | ForEach-Object {
$zipSetName = ($Name2[1]) + ".zip "
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
$i++;
$Name2++
if ($i -eq $SET_SIZE) {
$i = 0;
$zipSet++;
}
}
You can simplify things a bit, and it looks like most of the issues are because in your script example $Name2 will contain a different set of items than the Get-ChildItem $InputFolder will return in the loop (i.e. may have other objects other than .csv files).
The best way to deal with things is to use variables with the full file object (i.e. you don't need to use |select name). So I get all the CSV file objects right away and store in the variable $CsvFiles.
We can additionally use the special variable $_ inside the ForEach-Object which represents the current object. We also can use $_.BaseName to give us the name without the extension (assuming that's what you want, otherwise use $_Name to get a zip with the name like xyz.csv).
So a simplified version of the code can be:
$InputFolder= "C:\Temp\teste"
$OutputFolder="C:\Temp\teste"
#Get files to process
$CsvFiles = Get-ChildItem $InputFolder -Filter '*.csv'
#loop through all files to zip
$CsvFiles | ForEach-Object {
$zipSetName = $_.BaseName + ".zip"
Compress-Archive -Path $_.FullName -DestinationPath "$OutputFolder\$zipSetName"
}

Match string with specific numbers from array

I want to create a script that searches through a directory for specific ".txt" files with the Get-ChildItem cmdlet and after that it copies the ".txt" to a location I want. The hard part for me is to extract specific .txt files string from the array. So basically I need help matching specific files names in the array. Here is an example of the array I'm getting back with the following cmdlet:
$arrayObject = (Get-ChildItem -recurse | Where-Object { $_.Name -eq "*.txt"}).Name
The arrayobject variable is something like this:
$arrayobject = "test.2.5.0.txt", "test.1.0.0.txt", "test.1.0.1.txt",
"test.0.1.0.txt", "test.0.1.1.txt", "test.txt"
I want to match my array so it returns the following:
test.2.5.0.txt, test.1.0.0.txt, test.1.0.1.txt
Can someone help me with Regex to match the above file names from the $arrayObject?
As you already add the -Recurse parameter to Get-ChildItem, you can also use the -Include parameter like this:
$findThese = "test.2.5.0.txt", "test.1.0.0.txt", "test.1.0.1.txt"
$filesFound = (Get-ChildItem -Path 'YOUR ROOTPATH HERE' -Recurse -File -Include $findThese).Name
P.S. without the -Recurse parameter you need to add \* to the end of the rootfolder path to be able to use -Include
Maybe something like:
$FileList = Get-ChildItem -path C:\TEMP -Include *.txt -Recurse
$TxtFiles = 'test1.txt', 'test3.txt', 'test9.txt'
Foreach ($txt in $TxtFiles) {
if ($FileList.name -contains $txt) {Write-Host File: $Txt is present}
}
A general rule: Filter as left as possible! Less objects to be processed, less resources to be used, faster to be processed!
Hope it helps!
Please try to clarify what the regex should match.
I created a regex which matches out of the given filenames only the files you wanted to retrieve:
"*.[1-9].[0-9].[0-9].txt"
You can tryout the small check I wrote.
ForEach($file in $arrayobject){
if($file -LIKE "*.[1-9].[0-9].[0-9].txt"){
Write-Host $file
}}
I think the "-LIKE" operator would be better to check if a string matches a regex.
Let me know if this helps.
Sorry for the late reply. Just got back in the office today. My question has been misinterpreted but that's my fault. I wasn't clear what I really want to do.
What I want to do is search through a directory and retrieve/extract in my case the (major)version of a filename. So in my case file "test.2.5.0.txt" would be version 2.5.0. After that I will get the MajorVersion and that's 2. Then in an If statement I would check if it's greater or equal to 1 and then copy it to a specific destination. To add some context to it. It's nupkg files and not txt. But I figured it out. This is code:
$sourceShare = "\\server1name\Share\txtfilesFolder"
destinationShare = "\\server2name\Share\txtfilesFolder"
Get-ChildItem -Path $sourceShare `
-Recurse `
-Include "*.txt" `
-Exclude #("*.nuspec", "*.sha512") `
| Foreach-Object {
$fileName = [System.IO.Path]::GetFileName($_)
[Int]$majorVersion = (([regex]::Match($fileName,"(\d+(.\d+){1,})" )).Value).Split(".")[0]
if ($majorVersion -ge 1)
{
Copy-Item -Path $_.FullName `
-Destination $destinationShare `
-Force
}
}
If you have anymore advice. Let me know. I would be great to extract the major version without using the .Split method
Grtz

Move Files to another folder based on part of file name

Hello Power Shell Champs,
I have a situation.
I have around 100 files with country name examples in C:\reports as:
Report File-USA.ppt
Report File-Canada.ppt
Report File-Brazil.ppt
Report File-Chile.ppt
I have folders with country names also in the folder C:\Countries
What I want to do is move files based on Country name to respective folders based on the name of country.
I'm uable to create a loop that works.
Note: Destination folders are already created, just files need to be moved
As Maigi mentioned, here is working code and it actually uses Move-Item as you requested in your original post rather than Copy-Item.
This code has been tested and works.
$list = (Get-ChildItem -Path C:\reports\ -Name -File).Replace('Report File-','').Replace('.ppt','')
ForEach-Object ($item in $list)
{
Move-Item -Path "C:\reports\Report File-$($item).ppt" -Destination "C:\Countries\$($item)\"
}
There is always more than one method to accomplish a task in PowerShell:
This script
splits the BaseName at the hyphen, and takes the second part(zero based)
if the destination path doesn't exist it is created.
Get-ChildItem 'C:\reports\Report File-*.ppt' -File | ForEach-Object{
$Country = Join-Path "C:\Country" $($_.BaseName.split('-')[1])
If (!(Test-path $Country)) {mkdir $Country|Out-Null}
$_|Move-Item -Destination $Country -whatif
}
If the output looks OK, remove the -whatif
Although some code from your side to see how far you've come would have been helpful, I imagine something like this should do the trick:
(edited)
$list = (Get-ChildItem -Path C:\reports\ -Name -File).Replace('Report File-','').Replace('.ppt','')
foreach ($item in $list) {
Move-Item -Source ".\Report File-$($item).ppt" -Destination "C:\Countries\$($item)\"
}