How to count occurrence of each word on each line? - powershell
I'm stuck as I have to use Powershell at work. I've attached my code and results so far below.
$data = Get-Content "/Users/mikeshobes/Documents/Powershell/nfl.csv"
write-host $data.count total lines read from file
foreach ($line in $data)
{
write-host $line
}
13 total lines read from file
1,Tom Brady,NE,QB,93,142,65.5,47.3,"1,137",8,379,7,3,55,38.7,48,17,3,9,97.7
2,Matt Ryan,ATL,QB,70,98,71.4,32.7,"1,014",10.3,338,9,0,54,55.1,73T,13,2,8,135.3
3,Aaron Rodgers,GB,QB,80,128,62.5,42.7,"1,004",7.8,334.7,9,2,53,41.4,42T,17,1,10,103.8
4,Ben Roethlisberger,PIT,QB,64,96,66.7,32,735,7.7,245,3,4,35,36.5,62T,8,3,2,82.6
5,Russell Wilson,SEA,QB,40,60,66.7,30,449,7.5,224.5,4,2,22,36.7,42,5,2,6,97.2
6,Dak Prescott,DAL,QB,24,38,63.2,38,302,7.9,302,3,1,16,42.1,40T,3,1,2,103.2
7,Eli Manning,NYG,QB,23,44,52.3,44,299,6.8,299,1,1,12,27.3,51,3,2,2,72.1
8,Matt Moore,MIA,QB,29,36,80.6,36,289,8,289,1,1,16,44.4,37,3,0,5,97.8
9,Matthew Stafford,DET,QB,18,32,56.3,32,205,6.4,205,0,0,10,31.3,30,3,0,3,75.7
10,Alex Smith,KC,QB,20,34,58.8,34,172,5.1,172,1,1,9,26.5,24,3,0,1,69.7
11,Brock Osweiler,HOU,QB,14,25,56,25,168,6.7,168,1,0,9,36,38,1,0,0,90.1
12,Connor Cook,OAK,QB,18,45,40,45,161,3.6,161,1,3,11,24.4,20,1,0,3,30
13,Julian Edelman,NE,WR,0,1,0,0.3,0,0,0,0,0,0,0,--,0,0,0,39.6
if what you want to get is the amount of times every word in a line is in that line, first you need to split the line in words an then get the amount of times that word is in that line.
I think a snippet of code explain it better:
For ($i = 0; $i -lt $Data.Length; $i++) {
$Line = $Data[$i]
Foreach ($Word in $Line.Split(' ,')) {
Write-Host ('Line {0} contains the word: "{1}" {2} time(s)' -f ($i + 1), $Word, (($Line -split $Word).Count-1))
}
}
Related
How to cut specific characters in a string and save it to a new file?
I have a string, and I want to cut some characters and store it to a new file. I tried this code, but it still error. $a = ";Code=NB" $b = $a -split "=" $b[1] $Save = "[AGM]", "CR=JP", "LOC= $b[1]"| Out-File "C:\Users\Out.txt"
Try something like this: $a = ";Code=NB" $null, $b, $null = $a -split '=', 3 $b $Save = "[AGM]", "CR=JP", "LOC= $b"| Out-File "C:\Users\Out.txt"
Something that would be easier to maintain would be this: #Words to remove from string $wordsToCut = "This","is" #Phrase to remove words from $phrase = "This is a test" #Running through all words in words to remove foreach ($word in $wordsToCut){ #Replace current word with nothing $phrase = $phrase.Replace($word,"") } #Output end result Write-Host $phrase You would also use trim to remove any leading or trailing spaces. The above code outputs: a test
Concatenate int variable from for loop with string
I'm passing in possibly four file names: FileName1, FileName2, FileName3, FileName4. Some may be empty, some not. Because of this I need to check if they are empty before using them. So instead of using four if statements I thought I would just loop through them. How I wanted to do that was like this: for ($i = 1; $i -lt 5; $i++) { $FileName = $FileName + $i Write-Host $FileName } So I can get Filename1, FileName2, FileName3, FileName4. Instead I get 1, 1 + 2, etc. I've also tried $FileName$i, $FileName"$($i)", $FileName + "$($i)". Any ideas? EDIT FileName1, FileName2, FileName3, FileName4 are variables that are passed to the script. They could be FileName1="Budget2018.xlsx", FileName2="MonthlyExpenses.xlx", FileName3="", FileName4="". Or all four variables can contain values ... or just FileName1 can contain a value, etc. I need to check if they are empty before I continue on processing them. So rather than use 4 if statements to check if they are empty I thought I could loop through them referencing the variables as $Filename$i where $i would be the value 1 to 4. I'm trying to concatenate the two values together to represent the variables that are the parameters.
This produces FileName1, FileName2 etc, but I'm not sure how that squares with "I'm passing in four possible filenames", as there's no list in your script. For($i = 1; $i -lt 5; $i++){ $FileName = "FileName$i" Write-Host $FileName }
If I understand what you want then this should do it: For($i = 1; $i -lt 5; $i++){ $FileName = 'FileName{0}' -f $i Write-Host $FileName } If you want to define the root portion of the name as a variable then: $root = 'FileName' For($i = 1; $i -lt 5; $i++){ $FileName = '{0}{1}' -f $root,$i Write-Host $FileName } Ok, so based on the most recent edit, here is a way to do what I now think you want: 1..4 | %{(ls variable:\$("FileName$_")).value}
If you have a single filename, and you want a list created based off that, you can do something like this: $FileName = 'test' $FileList = for ($i = 1; $i -le 4; $i++) { "$FileName$i" } $FileList # => test1 # => test2 # => test3 # => test4 But I suspect you have a list of files given your question had "check if null/empty" mentioned in it.
How do edit the last occurrence of a particular string in powershell
My text file contains G-Code with the code "G94" appearing 5 times at different line numbers. G94 G94 G94 G94 G94 I need to change the last occurrence of "G94" to G94 /M16 but I keep getting no edit at all. I'm trying this: $text = get-content C:\cncm\g94.txt $i = 1 $replace = 5 #Replace the 5th match ForEach ( $match in ($text | select-String "G94" -AllMatches).matches) { $index = $match.Index if ( $i -eq $replace ) { $text.Remove($index,"G94".length).Insert($index,"G94 n /M16") } $i++ } What am I missing?
$text is an array of strings, how are you calling Remove() without getting an exception? First because Remove() only takes one parameter, second because you can't remove from a fixed length array. I'm thinking: $text = get-content C:\cncm\g94.txt $fifthMatch = ($text | select-string "G94" -AllMatches)[4] $line = $text[$fifthMatch.LineNumber] $line = $line.Remove($fifthMatch.index,"G94".length).Insert($fifthMatch.index,"G94 `n /M16") $text[$fifthMatch.LineNumber] = $line $text | out-file c:\cncm\g942.txt
Use regexp with negative lookahead on a string that contains the entire file. Replacing last occurrence in the entire file - (?s) DOTALL mode allows .* to span across new line characters: $text = [IO.File]::ReadAllText('C:\cncm\g94.txt') $text = $text -replace '(?s)G94(?!.*?G94)', "G94`n/M16" Replacing last occurrence in every line - (?m) MULTILINE mode: $text = [IO.File]::ReadAllText('C:\cncm\g94.txt') $text = $text -replace '(?m)G94(?!.*?G94)', "G94`n/M16"
Is there a string concatenation shortcut in PowerShell?
Like with numerics? For example, $string = "Hello" $string =+ " there" In Perl you can do my $string = "hello" $string .= " there" It seems a bit verbose to have to do $string = $string + " there" or $string = "$string there"
You actually have the operator backwards. It should be +=, not =+: $string = "Hello" $string += " there" Below is a demonstration: PS > $string = "Hello" PS > $string Hello PS > $string += " there" PS > $string Hello there PS > However, that is about the quickest/shortest solution you can get to do what you want.
Avoid += for building strings As for using the increase assignment operator (+=) to create a collection, strings are also mutual, therefore using the increase assignment operator (+=) to build a string might become pretty expensive as it will concatenate the strings and assign (copy!) it back into the variable. Instead I recommend to use the PowerShell pipeline with the -Join operator to build your string. Apart from the fact it is faster it is actually more DRY as well: $String = 'Hello', 'there' -Join ' ' # Assigns: 'Hello there' Or just -Join #('One', 'Two', 'Three') # Yields: 'OneTwoThree' For just a few items it might not make much sense but let me give you a a more common example by creating a formatted list of numbers, something like: [Begin] 000001 000002 000003 [End] You could do this: $x = 3 $String = '[Begin]' + [Environment]::NewLine for ($i = 1; $i -le $x; $i++) { $String += '{0:000000}' -f $i + [Environment]::NewLine } $String += '[End]' + [Environment]::NewLine But instead, you might actually do it the PowerShell way: $x = 3 $String = #( '[Begin]' for ($i = 1; $i -le $x; $i++) { '{0:000000}' -f $i } '[End]' ) -Join [Environment]::NewLine Performance Testing To show the performance decrease of using the increase assignment operator (+=), let's increase $x with a factor 1000 up till 20.000: 1..20 | ForEach-Object { $x = 1000 * $_ $Performance = #{x = $x} $Performance.Pipeline = (Measure-Command { $String1 = #( '[Begin]' for ($i = 1; $i -le $x; $i++) { '{0:000000}' -f $i } '[End]' ) -Join [Environment]::NewLine }).Ticks $Performance.Increase = (Measure-Command { $String2 = '[Begin]' + [Environment]::NewLine for ($i = 1; $i -le $x; $i++) { $String2 += '{0:000000}' -f $i + [Environment]::NewLine } $String2 += '[End]' + [Environment]::NewLine }).Ticks [pscustomobject]$Performance } | Format-Table x, Increase, Pipeline, #{n='Factor'; e={$_.Increase / $_.Pipeline}; f='0.00'} -AutoSize Results: x Increase Pipeline Factor - -------- -------- ------ 1000 261337 252704 1,03 2000 163596 63225 2,59 3000 432524 127788 3,38 4000 365581 137370 2,66 5000 586370 171085 3,43 6000 1219523 248489 4,91 7000 2174218 295355 7,36 8000 3148111 323416 9,73 9000 4490204 373671 12,02 10000 6181179 414298 14,92 11000 7562741 447367 16,91 12000 9721252 486606 19,98 13000 12137321 551236 22,02 14000 14042550 598368 23,47 15000 16390805 603128 27,18 16000 18884729 636184 29,68 17000 21508541 708300 30,37 18000 24157521 893584 27,03 19000 27209069 766923 35,48 20000 29405984 814260 36,11 See also: PowerShell scripting performance considerations - String addition
Powershell / Perl : Merging multiple CSV files into one?
I have the following CSV files, I want to merge these into a single CSV 01.csv apples,48,12,7 pear,17,16,2 orange,22,6,1 02.csv apples,51,8,6 grape,87,42,12 pear,22,3,7 03.csv apples,11,12,13 grape,81,5,8 pear,11,5,6 04.csv apples,14,12,8 orange,5,7,9 Desired output: apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,87,42,12,81,5,8,,, pear,17,16,2,22,3,7,11,5,6,,, orange,22,6,1,,,,,,5,7,9 Can anyone provide guidance on how to achieve this? Preferably using Powershell but open to alternatives like Perl if that's easier. Thanks Pantik, your code's output is close to what I want: apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,87,42,12,81,5,8 orange,22,6,1,5,7,9 pear,17,16,2,22,3,7,11,5,6 Unfortunately I need "placeholder" commas in place for when the entry is NOT present in a CSV file, e.g. orange,22,6,1,,,,,,5,7,9 rather than orange,22,6,1,5,7,9 UPDATE: I would like these parsed in order of the filenames, e.g.: $myFiles = #(gci *.csv) | sort Name foreach ($file in $myFiles){ regards ted
Here is my Perl version: use strict; use warnings; my $filenum = 0; my ( %fruits, %data ); foreach my $file ( sort glob("*.csv") ) { $filenum++; open my $fh, "<", $file or die $!; while ( my $line = <$fh> ) { chomp $line; my ( $fruit, #values ) = split /,/, $line; $fruits{$fruit} = 1; $data{$filenum}{$fruit} = \#values; } close $fh; } foreach my $fruit ( sort keys %fruits ) { print $fruit, ",", join( ",", map { $data{$_}{$fruit} ? #{ $data{$_}{$fruit} } : ",," } 1 .. $filenum ), "\n"; } Which gives me: apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,, So do you have a typo for grape or i have misunderstood something?
Ok, gangabass solution works, and is cooler than mine, but I'll add mine anyway. It is slightly stricter, and preserves a data structure that can be used as well. So, enjoy. ;) use strict; use warnings; opendir my $dir, '.' or die $!; my #csv = grep (/^\d+\.csv$/i, readdir $dir); closedir $dir; # sorting numerically based on leading digits in filename #csv = sort {($a=~/^(\d+)/)[0] <=> ($b=~/^(\d+)/)[0]} #csv; my %data; # To print empty records we first need to know all the names for my $file (#csv) { open my $fh, '<', $file or die $!; while (<$fh>) { if (m/^([^,]+),/) { #{ $data{$1} } = (); } } close $fh; } # Now we can fill in values for my $file (#csv) { open my $fh, '<', $file or die $!; my %tmp; while (<$fh>) { chomp; next if (/^\s*$/); my ($tag,#values) = split (/,/); $tmp{$tag} = \#values; } for my $key (keys %data) { unless (defined $tmp{$key}) { # Fill in empty values #{$tmp{$key}} = ("","",""); } push #{ $data{$key} }, #{ $tmp{$key} }; } } &myreport; sub myreport { for my $key (sort keys %data) { print "$key," . (join ',', #{$data{$key}}), "\n"; } }
Powershell: $produce = "apples","grape","orange","pear" $produce_hash = #{} $produce | foreach-object {$produce_hash[$_] = #(,$_)} $myFiles = #(gci *.csv) | sort Name foreach ($file in $myFiles){ $file_hash = #{} $produce | foreach-object {$file_hash[$_] = #($null,$null,$null)} get-content $file | foreach-object{ $line = $_.split(",") $file_hash[$line[0]] = $line[1..3] } $produce | foreach-object { $produce_hash[$_] += $file_hash[$_] } } $ofs = "," $out = #() $produce | foreach-object { $out += [string]$produce_hash[$_] } $out | out-file "outputfile.csv" gc outputfile.csv apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,, Should be easy to modify for additional items. Just add them to the $produce array.
Second Powershell solution (as requested) $produce = #() $produce_hash = #{} $file_count = -1 $myFiles = #(gci 0*.csv) | sort Name foreach ($file in $myFiles){ $file_count ++ $file_hash = #{} get-content $file | foreach-object{ $line = $_.split(",") if ($produce -contains $line[0]){ $file_hash[$line[0]] += $line[1..3] } else { $produce += $line[0] $file_hash[$line[0]] = #(,$line[0]) + (#($null) * 3 * $file_count) + $line[1..3] } } $produce | foreach-object { if ($file_hash[$_]){$produce_hash[$_] += $file_hash[$_]} else {$produce_hash[$_] += #(,$null) * 3} } } $ofs = "," $out = #() $produce_hash.keys | foreach-object { $out += [string]$produce_hash[$_] } $out | out-file "outputfile.csv" gc outputfile.csv apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,,
you have to parse the files, I don't see easier way hot to do it solution in powershell: UPDATE: ok, adjusted a bit - hopefully understandable $items = #{} $colCount = 0 # total amount of columns # loop through all files foreach ($file in (gci *.csv | sort Name)) { $content = Get-Content $file $itemsToAdd = 0; # columns added by this file foreach ($line in $content) { if ($line -match "^(?<group>\w+),(?<value>.*)") { $group = $matches["group"] if (-not $items.ContainsKey($group)) { # in case the row doesn't exists add and fill with empty columns $items.Add($group, #()) for($i = 0; $i -lt $colCount; $i++) { $items[$group] += "" } } # add new values to correct row $matches["value"].Split(",") | foreach { $items[$group] += $_ } $itemsToAdd = ($matches["value"].Split(",") | measure).Count # saves col count } } # in case that file didn't contain some row, add empty cols for those rows $colCount += $itemsToAdd $toAddEmpty = #() $items.Keys | ? { (($items[$_] | measure).Count -lt $colCount) } | foreach { $toAddEmpty += $_ } foreach ($key in $toAddEmpty) { for($i = 0; $i -lt $itemsToAdd; $i++) { $items[$key] += "" } } } # output Remove-Item "output.csv" -ea 0 foreach ($key in $items.Keys) { "$key,{0}" -f [string]::Join(",", $items[$key]) | Add-Content "output.csv" } Output: apples,48,12,7,51,8,6,11,12,13,14,12,8 grape,,,,87,42,12,81,5,8,,, orange,22,6,1,,,,,,,5,7,9 pear,17,16,2,22,3,7,11,5,6,,,
Here is a more consise way how to do it. However, it still doesn't add the commas when the item is missing. Get-ChildItem D:\temp\a\ *.csv | Get-Content | ForEach-Object -begin { $result=#{} } -process { $name, $otherCols = $_ -split '(?<=\w+),' if (!$result[$name]) { $result[$name] = #() } $result[$name] += $otherCols } -end { $result.GetEnumerator() | % { "{0},{1}" -f $_.Key, ($_.Value -join ",") } } | Sort