Combining like objects in an array

Combining like objects in an array - powershell

I am attempting to analyze a group of text files (MSFTP logs) and do counts of IP addresses that have submitted bad credentials. I think I have it worked out except I don't think that the array is passing to/from the function correctly. As a result, I get duplicate entries if the same IP appears in multiple log files. What am I doing wrong?
Function LogBadAttempt($FTPLog,$BadPassesArray)
{
$BadPassEx="PASS - 530"
Foreach($Line in $FTPLog)
{
if ($Line -match $BadPassEx)
{
$IP=($Line.Split(' '))[1]
if($BadPassesArray.IP -contains $IP)
{
$CurrentIP=$BadPassesArray | Where-Object {$_.IP -like $IP}
[int]$CurrentCount=$CurrentIP.Count
$CurrentCount++
$CurrentIP.Count=$CurrentCount
}else{
$info=#{"IP"=$IP;"Count"='1'}
$BadPass=New-Object -TypeName PSObject -Property $info
$BadPassesArray += $BadPass
}
}
}
return $BadPassesArray
}
$BadPassesArray=#()
$FTPLogs = Get-Childitem \\ftpserver\MSFTPSVC1\test
$Result = ForEach ($LogFile in $FTPLogs)
{
$FTPLog=Get-Content ($LogFile.fullname)
LogBadAttempt $FTPLog
}
$Result | Export-csv C:\Temp\test.csv -NoTypeInformation
The result looks like...
Count IP
7 209.59.17.20
20 209.240.83.135
18441 209.59.17.20
13059 200.29.3.98
and would like it to combine the entries for 209.59.17.20

You're making this way too complicated. Process the files in a pipeline and use a hashtable to count the occurrences of each IP address:
$BadPasswords = #{}
Get-ChildItem '\\ftpserver\MSFTPSVC1\test' | Get-Content | ? {
$_ -like '*PASS - 530*'
} | % {
$ip = ($_ -split ' ')[1]
$BadPasswords[$ip]++
}
$BadPasswords.GetEnumerator() |
select #{n='IP';e={$_.Name}}, #{n='Count';e={$_.Value}} |
Export-Csv 'C:\Temp\test.csv' -NoType

Related

For each thing in one CSV check for multiple types of matches in another CSV

Sorry if the description is unclear, but I couldn't think of how else to word it.
I have two CSV files:
LocalAdmins.csv -- ColumnA = PC name; ColumnB = username in local admin group
Exempt.csv -- ColumnA = PC name; ColumnB = username allowed to be a local admin
What I'm trying to do is loop through LocalAdmins.csv, and for each one check to see if the PC name shows up in Exempt.csv (or matches any defined naming patterns in that file), and if a match is found, check to see if the local admin username for that PC in LocalAdmins.csv shows up in the list of AllowedUsers for that PC in Exempt.csv.
If the username is NOT in the AllowedUsers list, or if the PC name is not in Exempt.csv, then output the entry from LocalAdmins.csv. Here is what I have so far:
$admins = Import-Csv .\LocalAdmins.csv
$exempt = Import-Csv .\Exempt.csv
$violations = ".\Violations.csv"
foreach ($admin in $admins) {
foreach ($item in $exempt) {
if ($admin.PC -like $item.PC) {
if ($admin.Name -notin ($item.AllowedUsers -split ",")) {
$admin | Export-Csv $violations -Append -NoTypeInformation
}
}
else {
$admin | Export-Csv $violations -Append -NoTypeInformation
}
}
}
The problem is the nested foreach loop generates duplicates, meaning if there are 3 lines in Exempt.csv then a single entry in LocalAdmins.csv will have 3 duplicate outputs (one for each line in Exempt.csv). So the output looks like this:
When it should look like this:
I'm guessing the problem is somewhere in the structure of the loops, but I just need some help figuring out what to tweak. Any input is greatly appreciated!

Not optimized (unique sort by any property should work):
$admins = Import-Csv .\LocalAdmins.csv
$exempt = Import-Csv .\Exempt.csv
$violations = ".\Violations.csv"
$(
foreach ($admin in $admins) {
foreach ($item in $exempt) {
if ($admin.PC -like $item.PC) {
if ($admin.Name -notin ($item.AllowedUsers -split ",")) {
$admin
}
}
else {
$admin
}
}
}
) | Sort-Object -Property PC, Name -Unique |
Export-Csv $violations -Append -NoTypeInformation

With better restrictions of the forEach, there shouldn't be duplicates
and no need to Sort -unique.
Getting input from here-strings
## Q:\Test\2019\02\05\SO_54523868.ps1
$admins = #'
PC,NAME
XYZlaptop,user6
workstationXYZ,user7
computerABC,user8
ABClaptop,user1
'# | ConvertFrom-Csv # .\LocalAdmins.csv
$exempt = #'
PC,AllowedUsers
*laptop,"user1,user2"
computerXYZ,"user3,user4"
workstation*,"user5"
'# | ConvertFrom-Csv # .\Exempt.csv
$violationsFile = ".\Violations.csv"
$violations = foreach ($admin in $admins) {
$violation = $True
foreach ($item in ($exempt|Where-Object {$admin.PC -like $_.PC})){
if ($admin.NAME -in ($item.AllowedUsers -split ',')){
$violation = $False
}
}
if ($violation){$admin}
}
$violations
$violations | Export-Csv $violationsFile -NotypeInformation
## with Doug Finke's ImportExcel module installed, you can directly get the excel file:
#$violations | Export-Excel .\Violatons.xlsx -AutoSize -Show

CSV file - count distinct, group by, sum

I have a file that looks like the following;
- Visitor ID,Revenue,Channel,Flight
- 1234,100,Email,BA123
- 2345,200,PPC,BA112
- 456,150,Email,BA456
I need to produce a file that contains;
The count of distinct Visitor IDs (3)
The total revenue (450)
The count of each Channel
Email 2
PPC 2
The count of each Flight
BA123 1
BA112 1
BA456 1
So far I have the following code, however when executing this on the 350MB file, it takes too long and in some cases breaks the memory limit. As I have to run this function on multiple columns, it is going through the file many times. I ideally need to do this in one file pass.
$file = 'log.txt'
function GroupBy($columnName)
{
$objects = Import-Csv -Delimiter "`t" $file | Group-Object $columnName |
Select-Object #{n=$columnName;e={$_.Group[0].$columnName}}, Count
for($i=0;$i -lt $objects.count;$I++) {
$line += $columnName +"|"+$objects[$I]."$columnName" +"|Count|"+ $objects[$I].'Count' + $OFS
}
return $line
}
$finalOutput += GroupBy "Channel"
$finalOutput += GroupBy "Flight"
Write-Host $finalOutput
Any help would be much appreciated.
Thanks,
Craig

The fact that your are importing the CSV again for each column is what is killing your script. Try to do the loading once, then re-use the data. For example:
$data = Import-Csv .\data.csv
$flights = $data | Group-Object Flight -NoElement | ForEach-Object {[PsCustomObject]#{Flight=$_.Name;Count=$_.Count}}
$visitors = ($data | Group-Object "Visitor ID" | Measure-Object).Count
$revenue = ($data | Measure-Object Revenue -Sum).Sum
$channel = $data | Group-Object Channel -NoElement | ForEach-Object {[PsCustomObject]#{Channel=$_.Name;Count=$_.Count}}
You can display the data like this:
"Revenue : $revenue"
"Visitors: $visitors"
$flights | Format-Table -AutoSize
$channel | Format-Table -AutoSize

This will probably work - using hashmaps.
Pros: It will be faster/use less memory.
Cons: It is less readable
by far than Group-Object, and requires more code.
Make it even less memory-hungry: Read the CSV-file line by line
$data = Import-CSV -Path "C:\temp\data.csv" -Delimiter ","
$DistinctVisitors = #{}
$TotalRevenue = 0
$ChannelCount = #{}
$FlightCount = #{}
$data | ForEach-Object {
$DistinctVisitors[$_.'Visitor ID'] = $true
$TotalRevenue += $_.Revenue
if (-not $ChannelCount.ContainsKey($_.Channel)) {
$ChannelCount[$_.Channel] = 0
}
$ChannelCount[$_.Channel] += 1
if (-not $FlightCount.ContainsKey($_.Flight)) {
$FlightCount[$_.Flight] = 0
}
$FlightCount[$_.Flight] += 1
}
$DistinctVisitorsCount = $DistinctVisitors.Keys | Measure-Object | Select-Object -ExpandProperty Count
Write-Output "The count of distinc Visitor IDs $DistinctVisitorsCount"
Write-Output "The total revenue $TotalRevenue"
Write-Output "The Count of each Channel"
$ChannelCount.Keys | ForEach-Object {
Write-Output "$_ $($ChannelCount[$_])"
}
Write-Output "The count of each Flight"
$FlightCount.Keys | ForEach-Object {
Write-Output "$_ $($FlightCount[$_])"
}

Powershell : merge two CSV files with partially duplicate lines

I have scraped two files from a website in order to list the companies in my city.
The first lists : name, city, phone number, email
The second lists : name, city, phone number
And I will have duplicate lines if I merge them, as an example, i will have the following :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm1";"Los Angeles";"000000";""
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> "Firm3";"Los Angeles";"000000";""
> ...
Is there a way to merge the two files and keep the max info like this :
> "Firm1";"Los Angeles";"000000";"info#firm1.lol"
> "Firm2";"Los Angeles";"111111";""
> "Firm3";"Los Angeles";"000000";"contact#firm3.lol"
> ...

According to the fact you've got a file like this called 'firm.csv'
"Firm1";"Los Angeles";"000000";"info#firm1.lol"
"Firm1";"Los Angeles";"000000";""
"Firm2";"Los Angeles";"111111";""
"Firm3";"Los Angeles";"000000";"contact#firm3.lol"
"Firm3";"Los Angeles";"000000";""
You can load it using :
$firms = import-csv C:\temp\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
Then
$firms | Sort-Object -Unique -Property 'Firm'
According to Joey's comment I improved the solution :
$firms | Group-Object -Property 'firm' | % {$_.group | Sort-Object -Property mail -Descending | Select-Object -first 1}

EDIT: just realized the two files don't contain the same headers. Here is an update.
$main = Import-Csv firm1.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ";"
$alt = Import-Csv firm2.csv -Header 'Firm','Town','Tel' -Delimiter ";"
foreach ($f in $alt)
{
$found = $false
foreach($g in $main)
{
if ($g.Firm -eq $f.Firm -and $g.city -eq $f.city)
{
$found = $true
if ($g.Tel -eq "")
{
$g.Tel = $f.Tel
}
}
}
if ($found -eq $false)
{
$main += $f
}
}
# Everything is merged into the $main array
$main

There must be better approach but this is one costy way to do this.
$firms = import-csv C:\firm.csv -Header 'Firm','Town','Tel','Mail' -Delimiter ';'
$Result = #()
ForEach($i in $firms){
$found = 0;
ForEach($m in $Result){
if($m.Firm -eq $i.Firm){
$found = 1
if( $i.Mail.length -ne 0 )
{
$m.Mail = $i.Mail
}
break;
}
}
if($found -eq 0){
$Result += [pscustomobject] #{Firm=$i.Firm; Town=$i.Town; Tel=$i.Tel; Mail=$i.Mail}
}
}
$Result | export-csv C:\out.csv

compare two csv using powershell and return matching and non-matching values

I have two csv files, i want to check the users in username.csv matches with userdata.csv copy
to output.csv. If it does not match return the name alone in the output.csv
For Ex: User Data contains 3 columns
UserName,column1,column2
Hari,abc,123
Raj,bca,789
Max,ghi,123
Arul,987,thr
Prasad,bxa,324
username.csv contains usernames
Hari
Rajesh
Output.csv should contain
Hari,abc,123
Rajesh,NA,NA
How to achieve this. Thanks
Sorry for that.
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
foreach ($User in $UserList)
{
ForEach ($Data in $UserData)
{
If($User.Username -eq $Data.UserName)
{
# Process the data
$Data
}
}
}
This returns only matching values. I also need to add the non-matching values in output
file. Thanks.

something like this will work:
$Path = "C:\PowerShell"
$UserList = Import-Csv -Path "$($path)\UserName.csv"
$UserData = Import-Csv -Path "$($path)\UserData.csv"
$UserOutput = #()
ForEach ($name in $UserList)
{
$userMatch = $UserData | where {$_.UserName -eq $name.usernames}
If($userMatch)
{
# Process the data
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 =$userMatch.column1;column2 =$userMatch.column2}
}
else
{
$UserOutput += New-Object PsObject -Property #{UserName =$name.usernames;column1 ="NA";column2 ="NA"}
}
}
$UserOutput | ft
It loops through each name in the user list. Line 9 does a search of the userdata CSV for a matching user name if it finds it it adds the user data for that user to the output if no match is found it adds the user name to the output with NA in both columns.
had to change your userList csv:
usernames
Hari
Rajesh
expected output:
UserName column1 column2
-------- ------- -------
Hari abc 123
Rajesh NA NA

I had a similar situation, where I needed a "changed record collection" holding the entire record when the current record was either new or had any changes when compared to the previous record. This was my code:
# get current and previous CSV
$current = Import-Csv -Path $current_file
$previous = Import-Csv -Path $previous_file
# collection with new or changed records
$deltaCollection = New-Object Collections.Generic.List[System.Object]
:forEachCurrent foreach ($row in $current) {
$previousRecord = $previous.Where( { $_.Id -eq $row.Id } )
$hasPreviousRecord = ($null -ne $previousRecord -and $previousRecord.Count -eq 1)
if ($hasPreviousRecord -eq $false) {
$deltaCollection.Add($current)
continue forEachCurrent
}
# check if value of any property is changed when compared to the previous
:forEachCurrentProperty foreach ($property in $current.PSObject.Properties) {
$columnName = $property.Name
$currentValue = if ($null -eq $property.Value) { "" } else { $property.Value }
$previousValue = if ($hasPreviousRecord) { $previousRecord[0]."$columnName" } else { "" }
if ($currentValue -ne $previousValue -or $hasPreviousRecord -eq $false) {
$deltaCollection.Add($currentCenter)
continue forEachCurrentProperty
}
}
}

I need help formatting output with PowerShell's Out-File cmdlet

I have a series of documents that are going through the following function designed to count word occurrences in each document. This function works fine outputting to the console, but now I want to generate a text file containting the information, but with the file name appended to each word in the list.
My current console output is:
"processing document1 with x unique words occuring as follows"
"word1 12"
"word2 8"
"word3 3"
"word4 4"
"word5 1"
I want a delimited file in this format:
document1;word1;12
document1;word2;8
document1;word3;3
document1;word4;4
document1;word1;1
document2;word1;16
document2;word2;11
document2;word3;9
document2;word4;9
document2;word1;13
While the function below gets me the lists of words and occurences, I'm having a hard time figuring out where or how to insert the filename variable so that it prints at the head of each line. MSDN has been less-than helpful, and most of the places I try to insert the variable result in errors (see below)
function Count-Words ($docs) {
$document = get-content $docs
$document = [string]::join(" ", $document)
$words = $document.split(" `t",[stringsplitoptions]::RemoveEmptyEntries)
$uniq = $words | sort -uniq
$words | % {$wordhash=#{}} {$wordhash[$_] += 1}
Write-Host $docs "contains" $wordhash.psbase.keys.count "unique words distributed as follows."
$frequency = $wordhash.psbase.keys | sort {$wordhash[$_]}
-1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File c:\out-file-test.txt -append
$grouped = $words | group | sort count
Do I need to create a string to pass to the out-file cmdlet? is this just something I've been putting in the wrong place on the last few tries? I'd like to understand WHY it's going in a particular place as well. Right now I'm just guessing, because I know I have no idea where to put the out-file to achieve my selected results.
I've tried formatting my command per powershell help, using -$docs and -FilePath, but each time I add anything to the out-file above that runs successfully, I get the following error:
Out-File : Cannot validate argument on parameter 'Encoding'. The argument "c:\out-file-test.txt" does not bel
ong to the set "unicode,utf7,utf8,utf32,ascii,bigendianunicode,default,oem" specified by the ValidateSet attribute. Sup
ply an argument that is in the set and then try the command again.
At C:\c.ps1:39 char:71
+ -1..-25 | %{ $frequency[$_]+" "+$wordhash[$frequency[$_]]} | Out-File <<<< -$docs -width 1024 c:\users\x46332\co
unt-test.txt -append
+ CategoryInfo : InvalidData: (:) [Out-File], ParameterBindingValidationException
+ FullyQualifiedErrorId : ParameterArgumentValidationError,Microsoft.PowerShell.Commands.OutFileCommand

I rewrote most of your code. You should utilize objects to make it easier formatting the way you want. This one splits on "space" and groups words together. Try this:
Function Count-Words ($paths) {
$output = #()
foreach ($path in $paths) {
$file = Get-ChildItem $path
((Get-Content $file) -join " ").Split(" ", [System.StringSplitOptions]::RemoveEmptyEntries) | Group-Object | Select-Object -Property #{n="FileName";e={$file.BaseName}}, Name, Count | % {
$output += "$($_.FileName);$($_.Name);$($_.Count)"
}
}
$output | Out-File test-out2.txt -Append
}
$filepaths = ".\test.txt", ".\test2.txt"
Count-Words -paths $filepaths
It outputs like you asked(document;word;count). If you want documentname to include extension, change $file.BaseName to $file.Name . Testoutput:
test;11;1
test;9;2
test;13;1
test2;word11;5
test2;word1;4
test2;12;1
test2;word2;2

Slightly different approach:
function Get-WordCounts ($doc)
{
$text_ = [IO.File]::ReadAllText($doc.fullname)
$WordHash = #{}
$text_ -split '\b' -match '\w+'|
foreach {$WordHash[$_]++}
$WordHash.GetEnumerator() |
foreach {
New-Object PSObject -Property #{
Word = $_.Key
Count = $_.Value
}
}
}
$docs = gci c:\testfiles\*.txt |
sort name
&{
foreach ($doc in dir $docs)
{
Get-WordCounts $doc |
sort Count -Descending |
foreach {
(&{$doc.Name;$_.Word;$_.Count}) -join ';'
}
}
} | out-file c:\somedir\wordcounts.txt

Try this:
$docs = #("document1", "document2", ...)
$docs | % {
$doc = $_
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
} | Export-CSV output.csv -Delimiter ";" -NoTypeInfo
If you want to make this into a function you could do it like this:
function Count-Words($docs) {
foreach ($doc in $docs) {
Get-Content $doc `
| % { $_.split(" `t",[stringsplitoptions]::RemoveEmptyEntries) } `
| Group-Object `
| select #{n="Document";e={$doc}}, Name, Count
}
}
$files = #("document1", "document2", ...)
Count-Words $files | Export-CSV output.csv -Delimiter ";" -NoTypeInfo

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Combining like objects in an array - powershell

Related

For each thing in one CSV check for multiple types of matches in another CSV

CSV file - count distinct, group by, sum

Powershell : merge two CSV files with partially duplicate lines

compare two csv using powershell and return matching and non-matching values

I need help formatting output with PowerShell's Out-File cmdlet

Categories

Resources