Count number of files in each subfolder, ignoring files with certain name - powershell

Consider the following directory tree
ROOT
BAR001
foo_1.txt
foo_2.txt
foo_ignore_this_1.txt
BAR001_a
foo_3.txt
foo_4.txt
foo_ignore_this_2.txt
foo_ignore_this_3.txt
BAR001_b
foo_5.txt
foo_ignore_this_4.txt
BAR002
baz_1.txt
baz_ignore_this_1.txt
BAR002_a
baz_2.txt
baz_ignore_this_2.txt
BAR002_b
baz_3.txt
baz_4.txt
baz_5.txt
baz_ignore_this_3.txt
BAR002_c
baz_ignore_this_4.txt
BAR003
lor_1.txt
The structure will always be like this, so no deeper subfolders. I'm working on a script to count the number of files:
for each BARXXX folder
for each BARXXX_Y folder
textfiles with "ignore_this" in the name, should be ignored in the count
For the example above, this would result into:
Folder Filecount
---------------------
BAR001 2
BAR001_a 2
BAR001_b 1
BAR002 1
BAR002_a 1
BAR002_b 3
BAR002_c 0
BAR003 1
I now have:
Function Filecount {
param(
[string]$dir
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($childs in $child) {
Write-Host (Get-ChildItem $dir | Measure-Object).Count;
}
}
Filecount -dir "C:\ROOT"
(Not ready yet but building) This however, does not work. $child seems to be empty. Please tell me what I'm doing wrong.

Well, to start, you're running ForEach ($childs in $child), this syntax is backwards, so that will cause you some issues! If you swap it, so that you're running:
ForEach ($child in $childs)
You'll get the following output:
>2
>2
>1
>1
>1
>3
>0
Alright, I'm back now with the completed answer. For one, instead of using Write-Out, I'm using a PowerShell custom object to let PowerShell do the hard work for me. I'm setting FolderName equal to the $child.BaseName, and then running a GCI on the $Child.FullName to get the file count. I've added an extra parameter called $ignoreme, that should have an asterisk value for the values you want to ignore.
Here's the complete answer now. Keep in mind that my file structure was a bit different than yours, so my file count is different at the bottom as well.
Function Filecount {
param(
[string]$dir="C:\TEMP\Example",
[string]$ignoreme = "*_*"
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($child in $childs) {
[pscustomobject]#{FolderName=$child.Name;ItemCount=(Get-ChildItem $child.FullName | ? Name -notlike $ignoreme | Measure-Object).Count}
}
}
>Filecount | ft -AutoSize
>FolderName ItemCount
>---------- ---------
>BAR001 2
>BAR001_A 1
>BAR001_b 2
>BAR001_C 0
>BAR002 0
>BAR003 0
If you're using PowerShell v 2.0, use this method instead.
Function Filecount {
param(
[string]$dir="C:\TEMP\Example",
[string]$ignoreme = "*_*"
)
$childs = Get-ChildItem $dir | where {$_.Attributes -eq 'Directory'}
Foreach ($child in $childs) {
$ObjectProperties = #{
FolderName=$child.Name
ItemCount=(Get-ChildItem $child.FullName | ? Name -notlike $ignoreme | Measure-Object).Count}
New-Object PSObject -Property $ObjectProperties
}
}

I like that way of creating an object 1RedOne, haven't seen that before, thanks.
We can improve the performance of the code in a few of ways. By using the Filter Left principle, which states that the provider for any cmdlet is inherently more efficient than running things through PowerShell, by performing fewer loops and by removing an unnecessary step:
Function Filecount
{
param
(
[string]$dir = ".",
[parameter(mandatory=$true)]
[string]$ignoreme
)
Get-ChildItem -Recurse -Directory -Path $dir | ForEach-Object `
{
[pscustomobject]#{FolderName=$_.Name;ItemCount=(Get-ChildItem -Recurse -Exclude "*$ignoreme*" -Path $_.FullName).count}
}
}
So, firstly we can use the -Directory switch of Get-Childitem in the top-level directory (I know this is available in v3.0 and above, not sure about v2.0).
Then we can pipe the output of this directly in to the next loop, without storing it first.
Then we can replace another Where-Object with a provider -Exclude.
Finally, we can remove the Measure-Object as a simple count of the array will do:
Filecount "ROOT" "ignore_this" | ft -a
FolderName ItemCount
---------- ---------
BAR001 2
BAR001_a 2
BAR001_b 1
BAR002 1
BAR002_a 1
BAR002_b 3
BAR002_c 0
BAR003 1
Cheers Folks!

Related

PowerShell: Find unique values from multiple CSV files

let's say that I have several CSV files and I need to check a specific column and find values that exist in one file, but not in any of the others. I'm having a bit of trouble coming up with the best way to go about it as I wanted to use Compare-Object and possibly keep all columns and not just the one that contains the values I'm checking.
So I do indeed have several CSV files and they all have a Service Code column, and I'm trying to create a list for each Service Code that only appears in one file. So I would have "Service Codes only in CSV1", "Service Codes only in CSV2", etc.
Based on some testing and a semi-related question, I've come up with a workable solution, but with all of the nesting and For loops, I'm wondering if there is a more elegant method out there.
Here's what I do have:
$files = Get-ChildItem -LiteralPath "C:\temp\ItemCompare" -Include "*.csv"
$HashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $files.Count; $i++){
$TempHashSet = [System.Collections.Generic.HashSet[String]]::New([String[]](Import-Csv $files[$i])."Service Code")
$HashList.Add($TempHashSet)
}
$FinalHashList = [System.Collections.Generic.List[System.Collections.Generic.HashSet[String]]]::New()
For ($i = 0; $i -lt $HashList.Count; $i++){
$UniqueHS = [System.Collections.Generic.HashSet[String]]::New($HashList[$i])
For ($j = 0; $j -lt $HashList.Count; $j++){
#Skip the check when the HashSet would be compared to itself
If ($j -eq $i){Continue}
$UniqueHS.ExceptWith($HashList[$j])
}
$FinalHashList.Add($UniqueHS)
}
It seems a bit messy to me using so many different .NET references, and I know I could make it cleaner with a tag to say using namespace System.Collections.Generic, but I'm wondering if there is a way to make it work using Compare-Object which was my first attempt, or even just a simpler/more efficient method to filter each file.
I believe I found an "elegant" solution based on Group-Object, using only a single pipeline:
# Import all CSV files.
Get-ChildItem $PSScriptRoot\csv\*.csv -File -PipelineVariable file | Import-Csv |
# Add new column "FileName" to distinguish the files.
Select-Object *, #{ label = 'FileName'; expression = { $file.Name } } |
# Group by ServiceCode to get a list of files per distinct value.
Group-Object ServiceCode |
# Filter by ServiceCode values that exist only in a single file.
# Sort-Object -Unique takes care of possible duplicates within a single file.
Where-Object { ( $_.Group.FileName | Sort-Object -Unique ).Count -eq 1 } |
# Expand the groups so we get the original object structure back.
ForEach-Object Group |
# Format-Table requires sorting by FileName, for -GroupBy.
Sort-Object FileName |
# Finally pretty-print the result.
Format-Table -Property ServiceCode, Foo -GroupBy FileName
Test Input
a.csv:
ServiceCode,Foo
1,fop
2,fip
3,fap
b.csv:
ServiceCode,Foo
6,bar
6,baz
3,bam
2,bir
4,biz
c.csv:
ServiceCode,Foo
2,bla
5,blu
1,bli
Output
FileName: b.csv
ServiceCode Foo
----------- ---
4 biz
6 bar
6 baz
FileName: c.csv
ServiceCode Foo
----------- ---
5 blu
Looks correct to me. The values 1, 2 and 3 are duplicated between multiple files, so they are excluded. 4, 5 and 6 exist only in single files, while 6 is a duplicate value only within a single file.
Understanding the code
Maybe it is easier to understand how this code works, by looking at the intermediate output of the pipeline produced by the Group-Object line:
Count Name Group
----- ---- -----
2 1 {#{ServiceCode=1; Foo=fop; FileName=a.csv}, #{ServiceCode=1; Foo=bli; FileName=c.csv}}
3 2 {#{ServiceCode=2; Foo=fip; FileName=a.csv}, #{ServiceCode=2; Foo=bir; FileName=b.csv}, #{ServiceCode=2; Foo=bla; FileName=c.csv}}
2 3 {#{ServiceCode=3; Foo=fap; FileName=a.csv}, #{ServiceCode=3; Foo=bam; FileName=b.csv}}
1 4 {#{ServiceCode=4; Foo=biz; FileName=b.csv}}
1 5 {#{ServiceCode=5; Foo=blu; FileName=c.csv}}
2 6 {#{ServiceCode=6; Foo=bar; FileName=b.csv}, #{ServiceCode=6; Foo=baz; FileName=b.csv}}
Here the Name contains the unique ServiceCode values, while Group "links" the data to the files.
From here it should already be clear how to find values that exist only in single files. If duplicate ServiceCode values within a single file wouldn't be allowed, we could even simplify the filter to Where-Object Count -eq 1. Since it was stated that dupes within single files may exist, we need the Sort-Object -Unique to count multiple equal file names within a group as only one.
It is not completely clear what you expect as an output.
If this is just the ServiceCodes that intersect then this is actually a duplicate with:
Comparing two arrays & get the values which are not common
Union and Intersection in PowerShell?
But taking that you actually want the related object and files, you might use this approach:
$HashTable = #{}
ForEach ($File in Get-ChildItem .\*.csv) {
ForEach ($Object in (Import-Csv $File)) {
$HashTable[$Object.ServiceCode] = $Object |Select-Object *,
#{ n='File'; e={ $File.Name } },
#{ n='Count'; e={ $HashTable[$Object.ServiceCode].Count + 1 } }
}
}
$HashTable.Values |Where-Object Count -eq 1
Here is my take on this fun exercise, I'm using a similar approach as yours with the HashSet but adding [System.StringComparer]::OrdinalIgnoreCase to leverage the .Contains(..) method:
using namespace System.Collections.Generic
# Generate Random CSVs:
$charset = 'abABcdCD0123xXyYzZ'
$ran = [random]::new()
$csvs = #{}
foreach($i in 1..50) # Create 50 CSVs for testing
{
$csvs["csv$i"] = foreach($z in 1..50) # With 50 Rows
{
$index = (0..2).ForEach({ $ran.Next($charset.Length) })
[pscustomobject]#{
ServiceCode = [string]::new($charset[$index])
Data = $ran.Next()
}
}
}
# Get Unique 'ServiceCode' per CSV:
$result = #{}
foreach($key in $csvs.Keys)
{
# Get all unique `ServiceCode` from the other CSVs
$tempHash = [HashSet[string]]::new(
[string[]]($csvs[$csvs.Keys -ne $key].ServiceCode),
[System.StringComparer]::OrdinalIgnoreCase
)
# Filter the unique `ServiceCode`
$result[$key] = foreach($line in $csvs[$key])
{
if(-not $tempHash.Contains($line.ServiceCode))
{
$line
}
}
}
# Test if the code worked,
# If something is returned from here means it didn't work
foreach($key in $result.Keys)
{
$tmp = $result[$result.Keys -ne $key].ServiceCode
foreach($val in $result[$key])
{
if($val.ServiceCode -in $tmp)
{
$val
}
}
}
i was able to get unique items as follow
# Get all items of CSVs in a single variable with adding the file name at the last column
$CSVs = Get-ChildItem "C:\temp\ItemCompare\*.csv" | ForEach-Object {
$CSV = Import-CSV -Path $_.FullName
$FileName = $_.Name
$CSV | Select-Object *,#{N='Filename';E={$FileName}}
}
Foreach($line in $CSVs){
$ServiceCode = $line.ServiceCode
$file = $line.Filename
if (!($CSVs | where {$_.ServiceCode -eq $ServiceCode -and $_.filename -ne $file})){
$line
}
}

Exclude index in powershell

I have a very simple requirement of removing couple of lines in a file. I found little help on the net , where we can make use of Index. Suppose i want to select 5th line i use
Get-Content file.txt | Select -Index 4
Similarly, what if i dont need the 5th and 6th line? How would the statement change?
Get-Content file.txt | Select -Index -ne 4
I tried using -ne between -Index and the number. It did not work. neither did "!=".
Also the below code gives no error but not the desired output
$tmp = $test | where {$_.Index -ne 4,5 }
Pipeline elements does not have Index auto-property, but you can add it, if you wish:
function Add-Index {
begin {
$i=-1
}
process {
Add-Member Index (++$i) -InputObject $_ -PassThru
}
}
Then you can apply filtering by Index:
Get-Content file.txt | Add-Index | Where-Object Index -notin 4,5
Don't know about the Index property or parameter, but you can also achieve it like this :
$count = 0
$exclude = 4, 5
Get-Content "G:\input\sqlite.txt" | % {
if($exclude -notcontains $count){ $_ }
$count++
}
EDIT :
The ReadCount property holds the information you need :)
$exclude = 0, 1
Get-Content "G:\input\sqlite.txt" | Where-Object { $_.ReadCount -NotIn $exclude }
WARNING : as pointed by #PetSerAl and #Matt, ReadCount starts at 1 and not 0 like arrays
Try this:
get-content file.txt | select -index (0..3 + 5..10000)
It's a bit of a hack, but it works. Downside is that building the range takes some time. Also, adjust the 10000 to make sure you get the whole file.
Convert this as an array and use RemoveRange method(ind index, int count)
[System.Collections.ArrayList]$text = gc C:\file.txt
$text.RemoveRange(4,1)
$text

make a Powershell script take only the first 10000 files it finds for this filter defined

I am trying to list me only the 10000 first files this script finds, but I can't find a way to make it work.
The script looks like this :
$totalList = #()
Get-Item 'D:\tempfolder\*\' | % {
$dir = $_.FullName
$list = Get-ChildItem $dir -recurse *.zip | sort modifyTime -desc | select -skip 2
$totalList = $totalList + $list;
}
$totalList = $totalList -take 10000
the -take 1000 doesn't work as expected, so how can I define a number of files it should find at maximum ?
You can say:
$totalList = $totalList[0..9999]

How do I get directory depth in PowerShell 3.0?

I need to find out how far down the directory structure inside a working directory goes. If the layout is something like
Books\
Email\
Notes\
Note 1.txt
Note 2.txt
HW.docx
then it should return 1, because the deepest items are 1 level below. But if it looks like
Books\
Photos\
Hello.c
then it should return 0, because there is nothing deeper than the first level.
Something like this should do the trick in V3:
Get-ChildItem . -Recurse -Name | Foreach {($_.ToCharArray() |
Where {$_ -eq '\'} | Measure).Count} | Measure -Maximum | Foreach Maximum
It's not as pretty, and arguably not as "Posh" as Keith's, but I suspect it might scale better.
$depth_ht = #{}
(cmd /c dir /ad /s) -replace '[^\\]','' |
foreach {$depth_ht[$_]++}
$max_depth =
$depth_ht.keys |
sort length |
select -last 1 |
select -ExpandProperty length
$root_depth =
($PWD -replace '[^\\]','').length
($max_depth -$root_depth)

How do I return a custom object in Powershell that's formatted as a table?

I'm pretty new to powershell, so I won't be surprised at all if I'm going about this all wrong. I'm trying to create a function that, when executed, prints results formatted as a table. Maybe it would even be possible to pipe those results to another function for further analysis.
Here's what I have so far. This is a simple function that iterates through a list of paths and collects the name of the directory and the number of items in that directory, putting the data in a hashtable, and returning an array of hashtables:
function Check-Paths(){
$paths =
"C:\code\DirA",
"C:\code\DirB"
$dirs = #()
foreach ($path in $paths){
if (Test-Path $path){
$len = (ls -path $path).length
}
else{
$len = 0
}
$dirName = ($path -split "\\")[-1]
$dirInfo = #{DirName = $dirName; NumItems = $len}
$dirs += $dirInfo
}
return $dirs
}
That seems straightforward enough. However, when I go run the command, this is what I get:
PS > Check-Paths
Name Value
---- -----
DirName DirA
NumItems 0
DirName DirB
NumItems 0
What I want is this:
DirName NumItems
------- --------
DirA 0
DirB 0
I could just hack my function to use a write statement, but I think there must be a much better way to do this. Is there a way to get the data formatted as a table, even better if that can be such that it can be piped to another method?
How 'bout using
return new-object psobject -Property $dirs
That would return an object whose properties match the items in the hashtable. Then you can use the built-in powershell formatting cmdlets to make it look like you want. since you only have 2 properties, it will be formatted as a table by default.
EDIT: Here's how the whole thing would look (After the various suggestions):
function Check-Paths(){
$paths =
"C:\code\DirA",
"C:\code\DirB"
$dirs = #()
foreach ($path in $paths){
if (Test-Path $path){
$len = (ls -path $path).length
}
else{
$len = 0
}
$dirName = ($path -split "\\")[-1]
new-object psobject -property #{DirName = $dirName; NumItems = $len}
}
}
Here's a one liner that will give you the number of children for each folder.
"C:\code\DirA", "C:\code\DirB" | ? {Test-Path $_} | Get-Item | select -property Name, #{ Name="NumOfItems" ; Expression = {$_.GetFileSystemInfos().Count} }
It passes an array of strings to Where-Object to select the ones that exist. The path strings that exist are passed to Get-Item to get the FileSystemObjects which get passed to Select-Object to create PSCustomObject objects. The PSCustomObjects have two properties, the name of the directory and the number of children.
If you want the outputted table columns closer together you can pipe the above to: Format-Table -AutoSize
Example usage and output:
dir | ? {$_.PsIsContainer} | select -property Name, #{ Name="NumOfItems" ; Expression = {$_.GetFileSystemInfos().Count} } | Format-Table -AutoSize
Name NumOfItems
---- ----------
Desktop 12
Favorites 3
My Documents 3
Start Menu 2