Counting files that start with the same characters - powershell

I'm trying to get a count of all files in a directory whose first few characters match a string. Here is the function:
function CountMatchingID($d1,$s){
$d = Get-ChildItem -filter $s -path $d1 | Measure-Object
return $d.count
}
I'm passing in the directory and a variable with a wildcard. It is returning the total count of files in the directory. However if I change the $s inside the function to the value of that variable like below, it will return the accurate number of files that have that beginning.
function CountMatchingID($d1,$s){
$d = Get-ChildItem -filter "201*" -path $d1 | Measure-Object
return $d.count
}
The files are employee id photos with employee numbers at the beginning and I'm trying to automate searching a csv file to rename them to the username so we can dump them into a directory for Jabber to show the photos.
I'm open to suggestions about a better way to accomplish this as well. I'm fairly new to Powershell so any feedback is welcome.

Issue might be what you are passing as $s for which you don't show an example. This works as expected for me
function CountMatchingID($d1,$s){
Get-ChildItem -filter "$s*" -path $d1 | Measure-Object | Select -ExpandProperty Count
}
I added the asterisk into the filter automatically so that it can just be assumed.
A sample call would look like this
CountMatchingID "c:\temp" "test"

Related

Find all files with a particular extension, sorting them by creation time and copy them with a new name

I am attempting to find recursively all files with the extension .raw and then sort them in ascending order of CreationTime. After that, I would like to copy each file to a new directory where the names are IMG_001_0001.jpg ... IMG_001_0099.jpg where I am using 4 digits in ascending order. It is important that the file name IMG_001_0001.jpg is the first one created and if there are 99 files, IMG_001_0099.jpg is the last file created.
I tried this:
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -include *.raw | Sort-Object CreationTime | ForEach-Object {copy $_.FullName F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_$($_.ReadCount).jpg}
If I understand correctly you could do it like this:
$count = #{ Value = 0 }
Get-ChildItem 'F:\Downloads\raw-20221121T200702Z-001.zip' -Recurse -Filter *.raw |
Sort-Object CreationTime | Copy-Item -Destination {
'F:\Downloads\raw-20221121T200702Z-001.zip/test/IMG_001_{0:D4}.jpg' -f
$count['Value']++
}
Using D4 for the format string ensures your integers would be represented with 4 digits. See Custom numeric format strings for details.
As you can note, instead of using ForEach-Object to enumerate each file, this uses a delay-bind script block to generate the new names for the destination files and each source object is bound from pipeline.
Worth noting that the forward slashes in /test/ might bring problems and likely should be changed to backslashes: \test\.
You don't need a hashtable to iterate... just use [int].
For the sake of clarity please don't use paths here that can easily be mistaken for a file rather than a directory name.
Get-Childitem does not work on files and if it does it's not portable.
Also that script block for -Destination is not likely to work as parameters defined outside it are not available inside. Nor is there any need to delay anything.
Something like this should be perfectly sufficient:
$ziproot ='F:\input_folder'
$count = 0
$candidates = Get-ChildItem -Recurse -Filter '*.raw' |
Sort-Object CreationTime
ForEach($file in $candidates)
{
copy-item -source $_.FullName -Destination ('{0}/test/IMG_001_{1:D4}{2}' -f $ziproot,++$count, $_.Extension )
}
(Try using foreach($var in $list) {commands} where you can, it's faster than foreach-object by about a factor of 10.)

How find a file with part of his name in PowerShell

I'm new to PowerShell and on StackOverflow.
I'm trying to write a script which gives me the Total Path to a *.pdf file,
My File Name is in a variable (I get it before with previous part of code) for example : $myVariable.Nom
After I try to find my file with this:
Get-ChildItem -Path "c:\myPath\" -Filter $myVariable.Nom
But it doesn't work. I think I have a mistake where I try to filter with my File Name in the ChildItem command but don't know how to use it correctly.
The goal is (if myVariable.Nom = "TOTO" for example) to have c:\myPath\TOTO.pdf
Can anyone help me ?
Thanks.
The -Filter parameter on Get-ChildItem supports a wildcard search using * or ?. Your code is pretty close to what you're looking for, you just need to add *.
Get-ChildItem -Path "c:\myPath\" -Filter "*$($myVariable.Nom)*"
Note that I'm using * at the beginning and end of the variable, meaning that the filter should search for any Item that contains the string in your variable.
In your example, you know that the File Name is TOTO so you could use * only at the end of the Filter:
-Filter "$($myVariable.Nom)*"
Remember that you can add the -Recurse flag on Get-ChildItem if you want to search recursively.
Now if you want to get the FullName (Total Path) you can use any of the examples below:
# Example 1
(Get-ChildItem -Path "c:\myPath\" -Filter "*$($myVariable.Nom)*").FullName
# Example 2
$var = Get-ChildItem -Path "c:\myPath\" -Filter "*$($myVariable.Nom)*"
$var.FullName
# Example 3
Get-ChildItem -Path "c:\myPath\" -Filter "*$($myVariable.Nom)*" |
Select-Object -Expand FullName
Edit
Following your second question, how to deal with a CSV with no headers:
$path = 'C:\path\to\csv.csv'
# Note: I'm using -Delimiter ';' because as you have shown,
# you are using semicolon as delimiter, however, if the delimiter
# are commas there is no need to use it.
$csv = Import-Csv $path -Delimiter ';' -Header 'Name','Qte'
# Now you can loop through it to find the files:
foreach($file in $csv)
{
$var = Get-ChildItem -Path "c:\myPath\" -Filter "*$($file.Name)*"
0..$file.Qte | foreach-object {
$var.FullName
}
}
Thank you for your answers : It's work fine !!
Now i want to improve it a litle bit : In my CSV i have 2 column : in the previous example i called the first column "Name" this is why i used myVariable.Name) for example:
Name;Qte
TOTO;1
TITI;3
Now how i can select the first column if i didn't have name on first line ?
file like this one for example :
TOTO;1
TITI;3
Thank you

Powershell capture first filename from a folder

Newbie to powershell
I need to capture first file name from a directory. However my current script captures all file names. Please suggest changes to my code below.
# Storing path of the desired folder
$path = "C:\foo\bar\"
$contents = Get-ChildItem -Path $path -Force -Recurse
$contents.Name
The result is following
test-01.eof
test-02.eof
test-03.eof
i just want one file (any) from this list. So expected result should be
test-01.eof
You could use Select-Object with the -first switch and set it to 1
$path = "C:\foo\bar\"
$contents = Get-ChildItem -Path $path -Force -Recurse -File | Select-Object -First 1
I've added the -File switch to Get-ChildItem as well, since you only want to return files.
$path = "C:\foo\bar\"
$contents = Get-ChildItem -Path $path -Force -Recurse
$contents # lists out all details of all files
$contents.Name # lists out all files
$contents[0].Name # will return 1st file name
$contents[1].Name # will return 2nd file name
$contents[2].Name # will return 3rd file name
The count starts from 0. So $contents here is an array or a list and whatever integer you mention in [] is the position of the item in that array/list. So when you type $contents[9], you are telling powershell that get the 9th item from the array $contents. This is how you iterate through a list. In most programming languages the count begins from 0 and not 1. It is a bit confusing for a someone who is entering the coding world but you get used to it.
Please use below command which is simple and helpful. Adding recurse will only put slight load on the machine or powershell (when the code is huge and it has been used somewhere)
Storing path of the desired folder
$path = "C:\foo\bar\"
$contents = Get-ChildItem -Path $path | sort | Select-Object -First 1
$contents.Name
Output Will be as as expected:
test-01.eof
Select-Object -First will select each object (or row) and provide first row data as an output if you put as 1

Cataloging file names over 256 characters

I am brand new to scripting (or coding of any sort). I had an issue where I wanted to generate csv files to catalog directories and certain file names to aid in my work. I was able to put something together that works for what I need. With one exception, long names return the following error:
ERROR: The specified path, file name, or both are too long. The fully qualified file name must be less than 260 characters, and the directory name must be less than 248 characters.
Here is my script:
Write-Host "Andy's File Lister v2.2"
$drive = Read-Host "R or Q?"
$client = Read-Host "What is the client's name as it appears on the R or Q drive?"
$path= "${drive}:\${client}"
Get-ChildItem $path -Recurse -dir | Select-Object FullName | Export-CSV $home\downloads\"$client directories.csv"
Get-ChildItem $path -Recurse -Include *.pdf, *.jp*, *.xl*, *.doc* | Select-Object FullName | Export-CSV $home\downloads\"$client files.csv"
Write-Host "Check your downloads folder."
Pause
As I said, I am brand new to this. Is there a different command I could use, or a way to tell the script to skip directory names or files over a certain length?
Thanks!
You can check the value of the .Length child property of the .FullName property of each item you check, and if it's greater than 256 characters, use Out-Null:
Ex.
$items = Get-ChildItem -Path C:\users\myusername\desktop\myfolder
foreach($item in $items)
{
if($item.FullName.Length -lt 256)
{
do some stuff
}
elseif($item.FullName.Length)
{
Out-Null
}
}
If you want to check the parent folder's path as well, you could check
$item.Parent.FullName.Length
in your processing as well.
I think you should close your strings on lines 5 and 6.
Instead of using ", you should use \" because currently your script parses the entire line 6 as a one string.

How to use Powershell to list duplicate files in a folder structure that exist in one of the folders

I have a source tree, say c:\s, with many sub-folders. One of the sub-folders is called "c:\s\Includes" which can contain one or more .cs files recursively.
I want to make sure that none of the .cs files in the c:\s\Includes... path exist in any other folder under c:\s, recursively.
I wrote the following PowerShell script which works, but I'm not sure if there's an easier way to do it. I've had less than 24 hours experience with PowerShell so I have a feeling there's a better way.
I can assume at least PowerShell 3 being used.
I will accept any answer that improves my script, but I'll wait a few days before accepting the answer. When I say "improve", I mean it makes it shorter, more elegant or with better performance.
Any help from anyone would be greatly appreciated.
The current code:
$excludeFolder = "Includes"
$h = #{}
foreach ($i in ls $pwd.path *.cs -r -file | ? DirectoryName -notlike ("*\" + $excludeFolder + "\*")) { $h[$i.Name]=$i.DirectoryName }
ls ($pwd.path + "\" + $excludeFolder) *.cs -r -file | ? { $h.Contains($_.Name) } | Select #{Name="Duplicate";Expression={$h[$_.Name] + " has file with same name as " + $_.Fullname}}
1
I stared at this for a while, determined to write it without studying the existing answers, but I'd already glanced at the first sentence of Matt's answer mentioning Group-Object. After some different approaches, I get basically the same answer, except his is long-form and robust with regex character escaping and setup variables, mine is terse because you asked for shorter answers and because that's more fun.
$inc = '^c:\\s\\includes'
$cs = (gci -R 'c:\s' -File -I *.cs) | group name
$nopes = $cs |?{($_.Group.FullName -notmatch $inc)-and($_.Group.FullName -match $inc)}
$nopes | % {$_.Name; $_.Group.FullName}
Example output:
someFile.cs
c:\s\includes\wherever\someFile.cs
c:\s\lib\factories\alt\someFile.cs
c:\s\contrib\users\aa\testing\someFile.cs
The concept is:
Get all the .cs files in the whole source tree
Split them into groups of {filename: {files which share this filename}}
For each group, keep only those where the set of files contains any file with a path that matches the include folder and contains any file with a path that does not match the includes folder. This step covers
duplicates (if a file only exists once it cannot pass both tests)
duplicates across the {includes/not-includes} divide, instead of being duplicated within one branch
handles triplicates, n-tuplicates, as well.
Edit: I added the ^ to $inc to say it has to match at the start of the string, so the regex engine can fail faster for paths that don't match. Maybe this counts as premature optimization.
2
After that pretty dense attempt, the shape of a cleaner answer is much much easier:
Get all the files, split them into include, not-include arrays.
Nested for-loop testing every file against every other file.
Longer, but enormously quicker to write (it runs slower, though) and I imagine easier to read for someone who doesn't know what it does.
$sourceTree = 'c:\\s'
$allFiles = Get-ChildItem $sourceTree -Include '*.cs' -File -Recurse
$includeFiles = $allFiles | where FullName -imatch "$($sourceTree)\\includes"
$otherFiles = $allFiles | where FullName -inotmatch "$($sourceTree)\\includes"
foreach ($incFile in $includeFiles) {
foreach ($oFile in $otherFiles) {
if ($incFile.Name -ieq $oFile.Name) {
write "$($incFile.Name) clash"
write "* $($incFile.FullName)"
write "* $($oFile.FullName)"
write "`n"
}
}
}
3
Because code-golf is fun. If the hashtables are faster, what about this even less tested one-liner...
$h=#{};gci c:\s -R -file -Filt *.cs|%{$h[$_.Name]+=#($_.FullName)};$h.Values|?{$_.Count-gt1-and$_-like'c:\s\includes*'}
Edit: explanation of this version: It's doing much the same solution approach as version 1, but the grouping operation happens explicitly in the hashtable. The shape of the hashtable becomes:
$h = {
'fileA.cs': #('c:\cs\wherever\fileA.cs', 'c:\cs\includes\fileA.cs'),
'file2.cs': #('c:\cs\somewhere\file2.cs'),
'file3.cs': #('c:\cs\includes\file3.cs', 'c:\cs\x\file3.cs', 'c:\cs\z\file3.cs')
}
It hits the disk once for all the .cs files, iterates the whole list to build the hashtable. I don't think it can do less work than this for that bit.
It uses +=, so it can add files to the existing array for that filename, otherwise it would overwrite each of the hashtable lists and they would be one item long for only the most recently seen file.
It uses #() - because when it hits a filename for the first time, $h[$_.Name] won't return anything, and the script needs put an array into the hashtable at first, not a string. If it was +=$_.FullName then the first file would go into the hashtable as a string and the += next time would do string concatenation and that's no use to me. This forces the first file in the hashtable to start an array by forcing every file to be a one item array. The least-code way to get this result is with +=#(..) but that churn of creating throwaway arrays for every single file is needless work. Maybe changing it to longer code which does less array creation would help?
Changing the section
%{$h[$_.Name]+=#($_.FullName)}
to something like
%{if (!$h.ContainsKey($_.Name)){$h[$_.Name]=#()};$h[$_.Name]+=$_.FullName}
(I'm guessing, I don't have much intuition for what's most likely to be slow PowerShell code, and haven't tested).
After that, using h.Values isn't going over every file for a second time, it's going over every array in the hashtable - one per unique filename. That's got to happen to check the array size and prune the not-duplicates, but the -and operation short circuits - when the Count -gt 1 fails, the so the bit on the right checking the path name doesn't run.
If the array has two or more files in it, the -and $_ -like ... executes and pattern matches to see if at least one of the duplicates is in the includes path. (Bug: if all the duplicates are in c:\cs\includes and none anywhere else, it will still show them).
--
4
This is edited version 3 with the hashtable initialization tweak, and now it keeps track of seen files in $s, and then only considers those it's seen more than once.
$h=#{};$s=#{};gci 'c:\s' -R -file -Filt *.cs|%{if($h.ContainsKey($_.Name)){$s[$_.Name]=1}else{$h[$_.Name]=#()}$h[$_.Name]+=$_.FullName};$s.Keys|%{if ($h[$_]-like 'c:\s\includes*'){$h[$_]}}
Assuming it works, that's what it does, anyway.
--
Edit branch of topic; I keep thinking there ought to be a way to do this with the things in the System.Data namespace. Anyone know if you can connect System.Data.DataTable().ReadXML() to gci | ConvertTo-Xml without reams of boilerplate?
I'd do more or less the same, except I'd build the hashtable from the contents of the includes folder and then run over everything else to check for duplicates:
$root = 'C:\s'
$includes = "$root\includes"
$includeList = #{}
Get-ChildItem -Path $includes -Filter '*.cs' -Recurse -File |
% { $includeList[$_.Name] = $_.DirectoryName }
Get-ChildItem -Path $root -Filter '*.cs' -Recurse -File |
? { $_.FullName -notlike "$includes\*" -and $includeList.Contains($_.Name) } |
% { "Duplicate of '{0}': {1}" -f $includeList[$_.Name], $_.FullName }
I'm not as impressed with this as I would like but I thought that Group-Object might have a place in this question so I present the following:
$base = 'C:\s'
$unique = "$base\includes"
$extension = "*.cs"
Get-ChildItem -Path $base -Filter $extension -Recurse |
Group-Object $_.Name |
Where-Object{($_.Count -gt 1) -and (($_.Group).FullName -match [regex]::Escape($unique))} |
ForEach-Object {
$filename = $_.Name
($_.Group).FullName -notmatch [regex]::Escape($unique) | ForEach-Object{
"'{0}' has file with same name as '{1}'" -f (Split-Path $_),$filename
}
}
Collect all the files with the extension filter $extension. Group the files based on their names. Then of those groups find every group where there are more than one of that particular file and one of the group members is at least in the directory $unique. Take those groups and print out all the files that are not from the unique directory.
From Comment
For what its worth this is what I used for testing to create a bunch of files. (I know the folder 9 is empty)
$base = "E:\Temp\dev\cs"
Remove-Item "$base\*" -Recurse -Force
0..9 | %{[void](New-Item -ItemType directory "$base\$_")}
1..1000 | %{
$number = Get-Random -Minimum 1 -Maximum 100
$folder = Get-Random -Minimum 0 -Maximum 9
[void](New-Item -Path $base\$folder -ItemType File -Name "$number.txt" -Force)
}
After looking at all the others, I thought I would try a different approach.
$includes = "C:\s\includes"
$root = "C:\s"
# First script
Measure-Command {
[string[]]$filter = ls $includes -Filter *.cs -Recurse | % name
ls $root -include $filter -Recurse -Filter *.cs |
Where-object{$_.FullName -notlike "$includes*"}
}
# Second Script
Measure-Command {
$filter2 = ls $includes -Filter *.cs -Recurse
ls $root -Recurse -Filter *.cs |
Where-object{$filter2.name -eq $_.name -and $_.FullName -notlike "$includes*"}
}
In my first script, I get all the include files into a string array. Then i use that string array as a include param on the get-childitem. In the end, I filter out the include folder from the results.
In my second script, I enumerate everything and then filter after the pipe.
Remove the measure-command to see the results. I was using that to check the speed. With my dataset, the first one was 40% faster.
$FilesToFind = Get-ChildItem -Recurse 'c:\s\includes' -File -Include *.cs | Select Name
Get-ChildItem -Recurse C:\S -File -Include *.cs | ? { $_.Name -in $FilesToFind -and $_.Directory -notmatch '^c:\s\includes' } | Select Name, Directory
Create a list of file names to look for.
Find all files that are in the list but not part of the directory the list was generated from
Print their name and directory