Powershell - count files with same name pattern

Powershell - count files with same name pattern - powershell

I try to count files in a folder which have the same name pattern. In this case every signs before "_" is the important part (the pattern).
Example (c:\temp)
ct24fe_2016-03-01.txt
ct24fe_2016-03-04.txt
ct24fe_2016-03-08.txt
ct24fe_2016-04-01.txt
ct24fe_2016-04-04.txt
xye4ka_2015-03-04.txt
xye4ka_2015-03-08.txt
xye4ka_2015-03-10.txt
xye4ka_2015-03-15.txt
xye4ka_2015-04-01.txt
xye4ka_2015-04-04.txt
zzztgf_2014-04-16.txt
zzztgf_2014-04-18.txt
zzztgf_2014-04-19.txt
zzztgf_2014-05-15.txt
The result should be:
Name | Count
ct24fe | 5
xye4ka | 6
zzztgf | 4
How could I do this?
Thanks for your support.

Group-Object supports scriptblocks for the -Property argument, you can pipe the files directly to it:
Get-ChildItem C:\temp |Group-Object {$_.Name -split '_' |Select -First 1} -NoElement

So I've come up with this:
$input = "your stuff"
$array = New-Object System.Collections.ArrayList
$input | % {$array.Add($_.split("_")[0])}
$array | Group-Object -NoElement

One possibility:
(Get-ChildItem c:\temp -Name) -replace '_.*' | Group-Object -NoElement

Related

How to get five files with most lines in the current directory by simplest way?

There is such a shell command in the chapter "transformational programming" of "The Pragmatic Programmer".
Its function is to list the five files with the most lines in the current directory.
$ find . -type f | xargs wc -l | sort -n | tail -6 | head -5
470 ./debug.pml
470 ./test_to_build.pml
487 ./dbc.pml
719 ./domain_languages.pml
727 ./dry.pml
I'm trying to do the same thing with PowerShell，But it seems too wordy
(Get-ChildItem .\ | ForEach-Object {$_ | Select-Object -Property 'Name', #{label = 'Lines'; expression = {($_ | Get-Content).Length}}} |Sort-Object -Property 'Lines')|Select-Object -Last 5
I believe there will be a simpler way, but I can't think of it.
How to get files with most lines in the current directory by simplest way using PowerShell?
Of course, you don't need to use custom aliases and abbreviations to shorten the length. Although it looks more concise, it loses readability.

Get-Content * | Group-Object PSChildName | Select-Object Count, Name |
Sort-Object Count | Select-Object -Last 5

I finally found my own satisfactory answer!
Used 3 pipeline operators, shell used 5!
What's more, what we get is the object, which can be used for more extensible operations.
I feel better than shell of linux.
dir -file | sort {($_ | gc).Length} | select -l 5

Try either File.ReadLines with Linq or File.ReadAllLines with Count property.
File.ReadLines
Get-ChildItem .\ -File |
Select-Object -Property Name, #{n='Lines'; e= {
[System.Linq.Enumerable]::Count([System.IO.File]::ReadLines($_.FullName))
}
} | Sort-Object -Property 'Lines' -Descending | Select-Object -First 5
File.ReadAllLines
Get-ChildItem .\ -File |
Select-Object -Property Name, #{n='Lines'; e= {
[System.IO.File]::ReadAllLines($_.FullName).Count
}
} | Sort-Object -Property 'Lines' -Descending | Select-Object -First 5

A fast approach would be to use switch -File:
$files = (Get-ChildItem -File ).FullName
$result = foreach ($file in $files) {
$lineCount = 0
switch -File $file {
default { $lineCount++ }
}
[PsCustomObject]#{
File = $file
Lines = $lineCount
}
}
$result | Sort-Object Lines | Select-Object -Last 5

Output Group-Object Column to File

Using Powershell I created a data set like this.
Count Name Group
----- ---- -----
2 108005 {108005, 108005}
2 114763 {114763, 114763}
2 115826 {115826, 115826}
2 115925 {115925, 115925}
2 117435 {117435, 117435}
2 114152 {114152, 114152}
2 117093 {117093, 117093}
Using this code.
$check = Get-Content $file | Group | Where {$_.count -gt 1}
I used the code to check for duplicates and cannot figure out how to output the Name column to a list in a plain txt file with only the names. There doesn't seem to be much documentation on how to do this. Is this even possible?

Use either ForEach-Object or Select-Object -ExpandProperty:
$duplicates = Get-Content $file | Group | Where {$_.count -gt 1} |Select-Object -ExpandProperty Name
Since you don't need anything other than the name, you can avoid generating the individual sets during grouping, with the -NoElement switch parameter:
$duplicates = Get-Content $file | Group -NoElement | Where {$_.count -gt 1} |Select-Object -ExpandProperty Name

Comparing two files: Single column in FirstFile - Multiple columns in SecondFile

I've figured out how to compare single columns in two files, but I cant figure out how to compare two files, with one column in the first and multiple columns in the second file. Both containing emails.
First file.csv (contains single column with emails)
john#email.com
jack#email.com
jill#email.com
Second file.csv (contains multiple column with emails)
john#email.nl,john#email.eu,john#email.com
jill#email.se,jill#email.com,jill#email.us
By comparing I would like to output, the difference. This would result in.
Output.csv
jack#email.com
Anyone able to help me? :)
Single columns comparison and output difference
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv

Since the first file already contains email addresses per column, you can import it right away.
Take the second file and split the strings containing several addresses.
A new array with seperate addresses will be generated.
Judging from your output, you only seek addresses that are within the first csv but not in the second.
Your code could look like this:
$firstFile = Get-Content 'FirstFile.csv'
$secondFile = (Get-Content 'SecondFile.csv').Split(',')
foreach ($item in $firstFile) {
if ($item -notin $secondFile) {
$item | Export-Csv output.csv -Append -NoTypeInformation
}
}

If you want to maintain your code, can you consider a script like:
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Rename-Item .\users-emails.csv users-emails.csv.bk
Get-Content .\users-emails.csv.bk).replace(',', "`r`n") | Set-Content .\users-emails.csv
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv
Rename-Item .\users-emails.csv.bk users-emails.csv
or, more simplest
#Line extracts emails from list
$SubscribedMails = import-csv .\subscribed.csv | Select-Object -Property email
Get-Content .\users-emails.csv).replace(',', "`r`n") | Set-Content .\users-emails.csv.bk
#Line extracts emails from list
$ValidEmails = import-csv .\users-emails.csv.bk | Select-Object -Property email
$compare = Compare-Object $SubscribedMails $ValidEmails -property email -IncludeEqual | where-object {$_.SideIndicator -eq "<="} | Export-csv .\nonvalid-emails.csv –NoTypeInformation
(Get-Content .\nonvalid-emails.csv) | ForEach-Object { $_ -replace ',"<="' } > .\nonvalid-emails.csv
Remove-Item .\users-emails.csv.bk

None of the suggestions so far works :(
Still hoping :)
Will delete comment when happy :p

Can you try this?
$One = (Get-Content .\FirstFile.csv).Split(',')
$Two = (Get-Content .\SecondFile.csv).Split(',')
$CsvPath = '.\Output.csv'
$Diff = #()
(Compare-Object ($One | Sort-Object) ($two | Sort-Object)| `
Where-Object {$_.SideIndicator -eq '<='}).inputobject | `
ForEach-Object {$Diff += New-Object PSObject -Property #{email=$_}}
$Diff | Export-Csv -Path $CsvPath -NoTypeInformation
Output.csv will contain entries that exist in FirstFile but not SecondFIle.

How to parse filenames to determine the newest file in each of multiple folders

I have logs that are getting written from various Linux servers to a central windows NAS server. They're in E:\log in the format:
E:\log\process1\log20140901.txt,
E:\log\process2\20140901.txt,
E:\log\process3\log-process-20140901.txt,
etc.
Multiple files get copied on a weekly basis at the same time, so created date isn't a good way to determine what the newest file is. Therefore I wrote a powershell function to parse the date out, and I'm attempting to iterate through and get the newest file in each folder, using the output of my function as the "date". I'm definitely doing something wrong.
Here's the Powershell I've written so far:
Function ReturnDate ($file)
{
$f = $file
$f = [RegEx]::Matches($f,"(\d{8})") | Select-Object -ExpandProperty Value
$sqlDate = $f.Substring(0,4) + "-" + $f.substring(4,2) + "-" + $f.substring(6,2)
return $sqlDate
}
Get-ChildItem E:\log\* |
Where {$_.PsIsContainer} |
foreach-object { Get-ChildItem $_ -Recurse |
Where {!$_.PsIsContainer} |
ForEach-Object { ReturnDate $_}|
Sort-Object ReturnDate -Descending |
Select-Object -First 1 | Select Name,ReturnDate
}
I seem to be confounding properties and causing "You cannot call a method on null-valued expression errors", but I'm uncertain what to do from here.

I suspect your $f variable is null and you're trying to invoke a method (Substring) on a null value. Try this instead:
Get-ChildItem E:\Log -File -Recurse | Where Name -Match '(\d{8})\.' |
Foreach {Add-Member -Inp $_ NoteProperty ReturnDate ($matches[1]) -PassThru} |
Group DirectoryName |
Foreach {$_.Group | Sort ReturnDate -Desc | Select -First 1}
This does require V3 or higher. If you're on V1 or V2 change it to this:
Get-ChildItem E:\Log -Recurse |
Where {!$_.PSIsContainer -and $_.Name -Match '(\d{8})\.'} |
Foreach {Add-Member -Inp $_ NoteProperty ReturnDate ($matches[1]) -PassThru} |
Group DirectoryName |
Foreach {$_.Group | Sort ReturnDate -Desc | Select -First 1}

Your code was ok for me when i tried it up until you did a select you were requesting name and returndate when those properties did not exist. Creating a custom object with those values would make your code work. Also i removed some of the logic from your pipes. End result should still work though (I just made some dummy files to test with like your examples).
Working with your original code you could have something like this. This would only work on v3 or higher. Simple changes could make it work on lower if need be. Mostly where [pscustomobject] is concerned.
Function ReturnDate ($file)
{
$f = $file
$f = [RegEx]::Matches($f,"(\d{8})") | Select-Object -ExpandProperty Value
$sqlDate = $f.Substring(0,4) + "-" + $f.substring(4,2) + "-" + $f.substring(6,2)
[pscustomobject] #{
'Name' = $file.FullName
'ReturnDate' = $sqlDate
}
}
Get-ChildItem C:\temp\E\* -Recurse |
Where-Object {!$_PSIsContainer} |
ForEach-Object{ReturnDate $_} |
Sort-Object ReturnDate -Descending |
Select-Object -First 1

The Sort-Object cmdlet supports sorting by a custom script block and will sort by whatever the script block returns. So, use a regular expression to grab the timestamp and return it.
Get-ChildItem E:\log\* -Directory |
ForEach-Object {
Get-ChildItem $_ -Recurse -File |
Sort-Object -Property {
if( $_.Name -match '(\d{8})' )
{
return $Matches[1]
}
Write-Error ('File ''{0}'' doesn't contain a timestamp in its name.' -f $_.FullName)
} |
Select-Object -Last 1 |
Select Name,ReturnDate
}
Note that Select-Object -First 1 was changed to Select-Object -Last 1, since dates would be sorted from oldest to newest.

PowerShell filtering range of file names

I am trying to write a very simple (as far as I know :-) ) script in PowerShell v2.0.
Every morning I need to look at some files to check if they are up to date.
All of the files are in the same folder.
The files are named like so: from1.rar, from2.rar from13.rar, from14.rar, from27.rar, from29.rar and so on. As you can see, the files are in different ranges. I want to filter the name of the files by a range that I determine. I suppose regex will do the trick, but I don't know how to use it...
What I have for now is just filtering and sorting all of the files by time and name into one table:
Get-ChildItem -filter "*.rar" | sort LastWriteTime -Descending | sort name | Format-Table LastwriteTime, name > C:\Users\user1\Desktop\update.txt
Now I want to break the table to form a number of groups (or smaller tables) from the names of the files.

Something like this should do the trick:
$low = 10
$high = 25
Get-ChildItem -Filter '*.rar' | ? {
$_.Name -match 'from(\d+)\.rar' -and
[int]$matches[1] -gt $low -and
[int]$matches[1] -le $high
} | ...
Demonstration:
PS C:\> $files = 'from1.rar','from13.rar','from14.rar','from27.rar','from29.rar'
PS C:\> $files
from1.rar
from13.rar
from14.rar
from27.rar
from29.rar
PS C:\> $files | ? {
>> $_ -match 'from(\d+)\.rar' -and
>> [int]$matches[1] -gt $low -and
>> [int]$matches[1] -le $high
>> }
>>
from13.rar
from14.rar

Here's one way to use a regex:
$range = 5..10
get-childitem From*.rar |
where {$range -contains ($_.name -replace 'From(\d+)\.rar','$1')}

To break your tables into groups of tables, just add the -GroupBy parameter to the cmdlet Format-Table.
For example, to create a table for each file by its property Name:
Get-ChildItem -filter "*.rar" | sort LastWriteTime -Descending | sort name | Format-Table LastwriteTime, name -GroupBy Name
But that might generate too many groups if you have many files, so you may group the table on the first letter of the Name property, like so:
Get-ChildItem -filter "*.rar" | sort LastWriteTime -Descending | sort name | Format-Table LastwriteTime, name -GroupBy #{name="First Letter";E={ ($_.name).substring(0,1) }}
Or, to group the table on the first two letters of the Name property:
Get-ChildItem -filter "*.rar" | sort LastWriteTime -Descending | sort name | Format-Table LastwriteTime, name -GroupBy #{name="First Letter";E={ ($_.name).substring(0,2) }}

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell - count files with same name pattern - powershell

Group-Object supports scriptblocks for the -Property argument, you can pipe the files directly to it: Get-ChildItem C:\temp |Group-Object {$_.Name -split '_' |Select -First 1} -NoElement

So I've come up with this: $input = "your stuff" $array = New-Object System.Collections.ArrayList $input | % {$array.Add($_.split("_")[0])} $array | Group-Object -NoElement

One possibility: (Get-ChildItem c:\temp -Name) -replace '_.*' | Group-Object -NoElement

Related

How to get five files with most lines in the current directory by simplest way?

Output Group-Object Column to File

Comparing two files: Single column in FirstFile - Multiple columns in SecondFile

How to parse filenames to determine the newest file in each of multiple folders

PowerShell filtering range of file names

Categories

Resources