Count characters, excluding specific pages - powershell

thank you for taking the time to read and maybe help me!
I was doing an assignment with counting chars for a document, but now i wanted to see if i could count char without counting the first 3 pages of the document.
Did some research, and couldn't find much about it since i am fairly new to powershell.
clear-host
$b = Read-Host 'Indtast destination mappe' #Beder burgeren om at indtaste destinations mappen
Get-Content -path $b | Measure -Line -Word -Character | Out-File C:\Users\TimHen\Desktop\output.txt #Tæller linjer, ord og tegn i dokumentet.
#udskriver vokaler og konsonante

Judging from your code, I'd say you're reading a text file. A text file doesn't have pages, but what you could do is skip the first x amount of lines.
Get-Content $b | Select-Object -Skip 160 | Measure -Line -Word -Character | Out-File C:\Users\TimHen\Desktop\output.txt
Another possibility (but not really applicable in your scenario) is to use the Tail parameter of Get-Content. That will give you the x last lines of the file.
Get-Content $b -Tail 3000 | Measure -Line -Word -Character | Out-File C:\Users\TimHen\Desktop\output.txt

Related

Powershell CSV removing rows and then remove from whole file if A column matches

I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.

Word frequency elegantly in Powershell

Donald Knuth once got the task to write a literate program computing the word frequency of a file.
Read a file of text, determine the n most frequently used words, and print out a sorted list of those words along with their frequencies.
Doug McIlroy famously rewrote the 10 pages of Pascal in a few lines of sh:
tr -cs A-Za-z '\n' |
tr A-Z a-z |
sort |
uniq -c |
sort -rn |
sed ${1}q
As a little exercise, I converted this to Powershell:
(-split ((Get-Content -Raw test.txt).ToLower() -replace '[^a-zA-Z]',' ')) |
Group-Object |
Sort-Object -Property count -Descending |
Select-Object -First $Args[0] |
Format-Table count, name
I like that Powershell combines sort | uniq -c into a single Group-Object.
The first line looks ugly, so I wonder if it can be written more elegantly? Maybe there is a way to load the file with a regex delimiter somehow?
One obvious way to shorten the code would be to uses the aliases, but that does not help readability.
I would do it this way.
PS C:\users\me> Get-Content words.txt
One one
two
two
three,three.
two;two
PS C:\users\me> (Get-Content words.txt) -Split '\W' | Group-Object
Count Name Group
----- ---- -----
2 One {One, one}
4 two {two, two, two, two}
2 three {three, three}
1 {}
EDIT: Some code from Bruce Payette's Windows Powershell in Action
# top 10 most frequent words, hash table
$s = gc songlist.txt
$s = [string]::join(" ", $s)
$words = $s.Split(" `t", [stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -Unique
$words | % {$h=#{}} {$h[$_] += 1}
$frequency = $h.keys | sort {$h[$_]}
-1..-10 | %{ $frequency[$_]+" "+$h[$frequency[$_]]}
# or
$grouped = $words | group | sort count
$grouped[-1..-10]
Thanks js2010 and LotPings for important hints. To document what is probably the best solution:
$Input -split '\W+' |
Group-Object -NoElement |
Sort-Object count -Descending |
Select-Object -First $Args[0]
Things I learned:
$Input contains stdin. This is closer to McIlroys code than Get-Content some file.
split can actually take regex delimiters
the -NoElement parameter let me get rid of the Format-Table line.
Windows 10 64-bit. PowerShell 5
How to find what whole word (the not -the- or weather) regardless of case is most frequently used in a text file and how many times it is used using Powershell:
Replace 1.txt with your file.
$z = gc 1.txt -raw
-split $z | group -n | sort c* | select -l 1
Results:
Count Name
----- ----
30 THE

Powershell Select-String parse email from .rtf to .csv

Parse .rtf file, output email addresses in .csv file?
I have an .rtf file containing a bunch of email addresses, I need this parsed so that I can compare a .csv file to active users in Active Directory.
Basically I want what is to the left of "#my.domain.com"
$finds = Select-String -Path "path\to\my.rtf" -Pattern "#my.domain.com" | ForEach-Object {$_.Matches}
$finds | Select-Object -First 1 | ft *
This of course gives me one result so that I don't have alot of output.
I only manage to get matches or the complete line.
I've tried adding something along the line of
$finds = Select-String -Path "path\to\my.rtf" -Pattern "\w.#my.domain.com"
This gives me the very two last letters in the addresses.
If I keep adding dots to the "wildcard"
-Pattern "\w.....#my.domain.com"
I also get a ton of numbers/characters (.rtf formatting) for addresses that contains fewer characters.
How do I do this?
EDIT: I will update the question as soon as I've found a solution. As of now I'm trying with regular expressions.
Example:
-Pattern "\w*?#my.domain.com"
$mPattern = "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})"
$lines = get-content "path\to\your.rtf"
foreach($line in $lines){
([regex]::MAtch($rtfInput, $mpattern, "IgnoreCase ")).value }
This code worked for me. My inital code but with a new search pattern.
$finds = Select-String -Path "path\to\my.rtf" -Pattern "[a-zA-Z0-9._%+-]+#[a-zA-Z0-9.-]+(\.[a-zA-Z]{2,4})" | ForEach-Object {$_.Matches}
$finds | Select-Object -First 10 | ft *
Thanks!

Powershell counting same values from csv

Using PowerShell, I can import the CSV file and count how many objects are equal to "a". For example,
#(Import-csv location | where-Object{$_.id -eq "a"}).Count
Is there a way to go through every column and row looking for the same String "a" and adding onto count? Or do I have to do the same command over and over for every column, just with a different keyword?
So I made a dummy file that contains 5 columns of people names. Now to show you how the process will work I will show you how often the text "Ann" appears in any field.
$file = "C:\temp\MOCK_DATA (3).csv"
gc $file | %{$_ -split ","} | Group-Object | Where-Object{$_.Name -like "Ann*"}
Don't focus on the code but the output below.
Count Name Group
----- ---- -----
5 Ann {Ann, Ann, Ann, Ann...}
9 Anne {Anne, Anne, Anne, Anne...}
12 Annie {Annie, Annie, Annie, Annie...}
19 Anna {Anna, Anna, Anna, Anna...}
"Ann" appears 5 times on it's own. However it is a part of other names as well. Lets use a simple regex to find all the values that are only "Ann".
(select-string -Path 'C:\temp\MOCK_DATA (3).csv' -Pattern "\bAnn\b" -AllMatches | Select-Object -ExpandProperty Matches).Count
That will return 5 since \b is for a word boundary. In essence it is only looking at what is between commas or beginning or end of each line. This omits results like "Anna" and "Annie" that you might have. Select-Object -ExpandProperty Matches is important to have if you have more than one match on a single line.
Small Caveat
It should not matter but in trying to keep the code simple it is possible that your header could match with the value you are looking for. Not likely which is why I don't account for it. If that is a possibility then we could use Get-Content instead with a Select -Skip 1.
Try cycling through properties like this:
(Import-Csv location | %{$record = $_; $record | Get-Member -MemberType Properties |
?{$record.$($_.Name) -eq 'a';}}).Count

How to change column position in powershell?

Is there any easy way how to change column position? I'm looking for a way how to move column 1 from the beginning to the and of each row and also I would like to add zero column as a second last column. Please see txt file example below.
Thank you for any suggestions.
File sample
TEXT1,02/10/2015,55.930,57.005,55.600,56.890,1890
TEXT2,02/10/2015,51.060,52.620,50.850,52.510,4935
TEXT3,02/10/2015,50.014,50.74,55.55,52.55,5551
Output:
02/10/2015,55.930,57.005,55.600,56.890,1890,0,TEXT1
02/10/2015,51.060,52.620,50.850,52.510,4935,0,TEXT2
02/10/2015,50.014,50.74,55.55,52.55,5551,0,TEXT3
Another option:
#Prepare test file
(#'
TEXT1,02/10/2015,55.930,57.005,55.600,56.890,1890
TEXT2,02/10/2015,51.060,52.620,50.850,52.510,4935
TEXT3,02/10/2015,50.014,50.74,55.55,52.55,5551
'#).split("`n") |
foreach {$_.trim()} |
sc testfile.txt
#Script starts here
$file = 'testfile.txt'
(get-content $file -ReadCount 0) |
foreach {
'{1},{2},{3},{4},{5},{6},0,{0}' -f $_.split(',')
} | Set-Content $file
#End of script
#show results
get-content $file
02/10/2015,55.930,57.005,55.600,56.890,1890,0,TEXT1
02/10/2015,51.060,52.620,50.850,52.510,4935,0,TEXT2
02/10/2015,50.014,50.74,55.55,52.55,5551,0,TEXT3
Sure, split on commas, spit the results back minus the first result joined by commas, add a 0, and then add the first result to the end and join the whole thing with commas. Something like:
$Input = #"
TEXT1,02/10/2015,55.930,57.005,55.600,56.890,1890
TEXT2,02/10/2015,51.060,52.620,50.850,52.510,4935
TEXT3,02/10/2015,50.014,50.74,55.55,52.55,5551
"# -split "`n"|ForEach{$_.trim()}
$Input|ForEach{
$split = $_.split(',')
($Split[1..($split.count-1)]-join ','),0,$split[0] -join ','
}
I created file test.txt to contain your sample data. I Assigned each field a name, "one","two","three" etc so that i could select them by name, then just selected and exported back to csv in the order you wanted.
First, add the zero to the end, it will end up as second last.
gc .\test.txt | %{ "$_,0" } | Out-File test1.txt
Then, rearrange order.
Import-Csv .\test.txt -Header "one","two","three","four","five","six","seven","eight" | Select-Object -Property two,three,four,five,six,seven,eight,one | Export-Csv test2.txt -NoTypeInformation
This will take the output file and get rid of quotes and header line if you would rather not have them.
gc .\test2.txt | %{ $_.replace('"','')} | Select-Object -Skip 1 | out-file test3.txt