How do I get filename and line count per file using powershell - powershell

I have the following powershell script to count lines per file in a given directory:
dir -Include *.csv -Recurse | foreach{get-content $_ | measure-object -line}
This is giving me the following output:
Lines Words Characters Property
----- ----- ---------- --------
27
90
11
95
449
...
The counts-per-file is fine (I don't require words, characters, or property), but I don't know what filename the count is for.
The ideal output would be something like:
Filename Lines
-------- -----
Filename1.txt 27
Filename1.txt 90
Filename1.txt 11
Filename1.txt 95
Filename1.txt 449
...
How do I add the filename to the output?

try this:
dir -Include *.csv -Recurse |
% { $_ | select name, #{n="lines";e={
get-content $_ |
measure-object -line |
select -expa lines }
}
} | ft -AutoSize

I can offer another solution :
Get-ChildItem $testPath | % {
$_ | Select-Object -Property 'Name', #{
label = 'Lines'; expression = {
($_ | Get-Content).Length
}
}
}
I operate on the. TXT file, the return value is like this ↓
Name Lines
---- ----
1.txt 1
2.txt 2
3.txt 3
4.txt 4
5.txt 5
6.txt 6
7.txt 7
8.txt 8
9.txt 9
The reason why I want to sort like this is that I am rewriting a UNIX shell command (from The Pragmatic Programmer: Your Journey to Mastery on page 145).
The purpose of this command is to find out the five files with the largest number of lines.
At present, my progress is the above content,i'm close to success.
However, this command is far more complicated than the UNIX shell command!
I believe there should be a simpler way, I'm trying to find it.
find . -type f | xargs wc -l | sort -n | tail -5

I have used the following script that gives me lines in files of all sub directories in folder c:\temp\A. The output is in lines1.txt file. I have applied a filer to choose only file types of ".TXT".
Get-ChildItem c:\temp\A -recurse | where {$_.extension -eq ".txt"} | % {
$_ | Select-Object -Property 'Name', #{
label = 'Lines'; expression = {
($_ | Get-Content).Length
}
}
} | out-file C:\temp\lines1.txt

Related

Get results of For-Each arrays and display in a table with column headers one line per results

I am trying to get a list of files and a count of the number of rows in each file displayed in a table consisting of two columns, Name and Lines.
I have tried using format table but I don't think the problem is with the format of the table and more to do with my results being separate results. See below
#Get a list of files in the filepath location
$files = Get-ChildItem $filepath
$files | ForEach-Object { $_ ; $_ | Get-Content | Measure-Object -Line} | Format-Table Name,Lines
Expected results
Name Lines
File A
9
File B
89
Actual Results
Name Lines
File A
9
File B
89
Another approach how to make a custom object like this: Using PowerShell's Calculated Properties:
$files | Select-Object -Property #{ N = 'Name' ; E = { $_.Name} },
#{ N = 'Lines'; E = { ($_ | Get-Content | Measure-Object -Line).Lines } }
Name Lines
---- -----
dotNetEnumClass.ps1 232
DotNetVersions.ps1 9
dotNETversionTable.ps1 64
Typically you would make a custom object like this, instead of outputting two different kinds of objects.
$files | ForEach-Object {
$lines = $_ | Get-Content | Measure-Object -Line
[pscustomobject]#{name = $_.name
lines = $lines.lines}
}
name lines
---- -----
rof.ps1 11
rof.ps1~ 7
wai.ps1 2
wai.ps1~ 1

Word frequency elegantly in Powershell

Donald Knuth once got the task to write a literate program computing the word frequency of a file.
Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
Doug McIlroy famously rewrote the 10 pages of Pascal in a few lines of sh:
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
As a little exercise, I converted this to Powershell:
(-split ((Get-Content -Raw test.txt).ToLower() -replace '[^a-zA-Z]',' ')) |
Group-Object |
Sort-Object -Property count -Descending |
Select-Object -First $Args[0] |
Format-Table count, name
I like that Powershell combines sort | uniq -c into a single Group-Object.
The first line looks ugly, so I wonder if it can be written more elegantly? Maybe there is a way to load the file with a regex delimiter somehow?
One obvious way to shorten the code would be to uses the aliases, but that does not help readability.
I would do it this way.
PS C:\users\me> Get-Content words.txt
One one
two
two
three,three.
two;two
PS C:\users\me> (Get-Content words.txt) -Split '\W' | Group-Object
Count Name Group
----- ---- -----
2 One {One, one}
4 two {two, two, two, two}
2 three {three, three}
1 {}
EDIT: Some code from Bruce Payette's Windows Powershell in Action
# top 10 most frequent words, hash table
$s = gc songlist.txt
$s = [string]::join(" ", $s)
$words = $s.Split(" `t", [stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -Unique
$words | % {$h=#{}} {$h[$_] += 1}
$frequency = $h.keys | sort {$h[$_]}
-1..-10 | %{ $frequency[$_]+" "+$h[$frequency[$_]]}
# or
$grouped = $words | group | sort count
$grouped[-1..-10]
Thanks js2010 and LotPings for important hints. To document what is probably the best solution:
$Input -split '\W+' |
Group-Object -NoElement |
Sort-Object count -Descending |
Select-Object -First $Args[0]
Things I learned:
$Input contains stdin. This is closer to McIlroys code than Get-Content some file.
split can actually take regex delimiters
the -NoElement parameter let me get rid of the Format-Table line.
Windows 10 64-bit. PowerShell 5
How to find what whole word (the not -the- or weather) regardless of case is most frequently used in a text file and how many times it is used using Powershell:
Replace 1.txt with your file.
$z = gc 1.txt -raw
-split $z | group -n | sort c* | select -l 1
Results:
Count Name
----- ----
30 THE

Collecting Unique Items from large data set over multiple text files

I am using PowerShell to collect lists of names from multiple text files. May of the names in these files are similar / repeating. I am trying to ensure that PowerShell returns a single text file with all of the unique items. In looking at the data it looks like the script is gathering 271/296 of the unique items. I'm guessing that some of the data is being flagged as duplicates when it shouldn't, any suggestions?
#Take content of each file (all names) and add unique values to text file
#for each unique value, create a row & check to see which txt files contain
function List {
$nofiles = Read-Host "How many files are we pulling from?"
$data = #()
for ($i = 0;$i -lt $nofiles; $i++)
{
$data += Read-Host "Give me the file name for file # $($i+1)"
}
return $data
}
function Aggregate ($array) {
Get-Content $array | Sort-Object -unique | Out-File newaggregate.txt
}
#SCRIPT BODY
$data = List
aggregate ($data)
I was expecting this code to catch everything, but it's missing some items that look very similar. List of missing names and their similar match:
CORPINZUTL16 MISSING FROM OUTFILE
CORPINZTRACE MISSING FROM OUTFILE
CORPINZADMIN Found In File
I have about 20 examples like this one. Apparently the Get-Content -Unique is not checking every character in a line. Can anyone recommend a better way of checking each line or possibly forcing the get-character to check full names?
Just for demonstration this line creates 3 txt files with numbers
for($i=1;$i -lt 4;$i++){set-content -path "$i.txt" -value ($i..$($i+7))}
1.txt | 2.txt | 3.txt | newaggregate.txt
1 | | | 1
2 | 2 | | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 5 | 5 | 5
6 | 6 | 6 | 6
7 | 7 | 7 | 7
8 | 8 | 8 | 8
| 9 | 9 | 9
| | 10 | 10
Here using Get-Content with a range [1-3] of files
Get-Content [1-3].txt | Sort-Object {[int]$_} -Unique | Out-File newaggregate.txt
$All = Get-Content .\newaggregate.txt
foreach ($file in (Get-ChildItem [1-3].txt)){
Compare-Object $All (Get-Content $file.FullName) |
Select-Object #{n='File';e={$File}},
#{n="Missing";e={$_.InputObject}} -ExcludeProperty SideIndicator
}
File Missing
---- -------
Q:\Test\2019\05\07\1.txt 9
Q:\Test\2019\05\07\1.txt 10
Q:\Test\2019\05\07\2.txt 1
Q:\Test\2019\05\07\2.txt 10
Q:\Test\2019\05\07\3.txt 1
Q:\Test\2019\05\07\3.txt 2
there are two ways to achieve this one is using select-object -Unique which works when data is not sorted and can be used for small data or lists.
When dealing with large files we can use get-Unique command which works with sorted input, if input data is not sorted then it will give wrong results.
Get-ChildItem *.txt | Get-Content | measure -Line #225949
Get-ChildItem *.txt | Get-Content | sort | Get-Unique | measure -Line #119650
Here is my command for multiple files :
Get-ChildItem *.txt | Get-Content | sort | Get-Unique >> Unique.txt

PowerShell output contents of hash to file

I have code to output a sorted hashtable to the screen:
$h.getenumerator() | sort value -descending
It looks like this:
Name Value
---- -----
10.10.10.10 69566308 151
10.10.10.11 69566308 143
10.10.10.12 69566308 112
10.10.10.13 69566308 99
10.10.10.14 69566308 71
10.10.10.15 69566308 70
But I would like to output this to a file instead.
When I try to output to a file with
$h.getenumerator() | sort value -descending | out-string | Add-Content D:\Script\iis_stats.log
or
$h.getenumerator() | sort value -descending | Add-Content D:\Script\iis_stats.log
All I get is "System.Collections.DictionaryEntry" in D:\Script\iis_stats.log.
How do I fix it?
If you want it in the same format it displays on the screen (folded at the pipes for readability):
$h.getenumerator() |
Sort-Object value -descending |
Format-Table |
Out-String |
Add-Content D:\Script\iis_stats.log
You would get similar output on your screen if you did something like this as well:
$h.GetEnumerator() | sort value -Descending | ForEach-Object{
$_.GetType().FullName
}
For every entry you would see System.Collections.DictionaryEntry since it does not have a proper string equivalent.
Then you should just do what mjolinor suggests. PowerShell automatically appends the | Out-Default command to the console so that it displays in a way that is visually pleasing. When you send the output to Add-Content that prettification cannot occur.

How do I get directory depth in PowerShell 3.0?

I need to find out how far down the directory structure inside a working directory goes. If the layout is something like
Books\
Email\
Notes\
Note 1.txt
Note 2.txt
HW.docx
then it should return 1, because the deepest items are 1 level below. But if it looks like
Books\
Photos\
Hello.c
then it should return 0, because there is nothing deeper than the first level.
Something like this should do the trick in V3:
Get-ChildItem . -Recurse -Name | Foreach {($_.ToCharArray() |
Where {$_ -eq '\'} | Measure).Count} | Measure -Maximum | Foreach Maximum
It's not as pretty, and arguably not as "Posh" as Keith's, but I suspect it might scale better.
$depth_ht = #{}
(cmd /c dir /ad /s) -replace '[^\\]','' |
foreach {$depth_ht[$_]++}
$max_depth =
$depth_ht.keys |
sort length |
select -last 1 |
select -ExpandProperty length
$root_depth =
($PWD -replace '[^\\]','').length
($max_depth -$root_depth)