Using Powershell to compare two files and then output only the different string names - powershell

So I am a complete beginner at Powershell but need to write a script that will take a file, compare it against another file, and tell me what strings are different in the first compared to the second. I have had a go at this but I am struggling with the outputs as my script will currently only tell me on which line things are different, but it also seems to count lines that are empty too.
To give some context for what I am trying to achieve, I would like to have a static file of known good Windows processes ($Authorized) and I want my script to pull a list of current running processes, filter by the process name column so to just pull the process name strings, then match anything over 1 character, sort the file by unique values and then compare it against $Authorized, plus finally either outputting the different process strings found in $Processes (to the ISE Output Pane) or just to output the different process names to a file.
I have spent today attempting the following in Powershell ISE and also Googling around to try and find solutions. I heard 'fc' is a better choice instead of Compare-Object but I could not get that to work. I have thus far managed to get it to work but the final part where it compares the two files it seems to compare line by line, for which would always give me false positives as the line position of the process names in the file supplied would change, furthermore I only want to see the changed process names, and not the line numbers which it is reporting ("The process at line 34 is an outlier" is what currently gets outputted).
I hope this makes sense, and any help on this would be very much appreciated.
Get-Process | Format-Table -Wrap -Autosize -Property ProcessName | Outfile c:\users\me\Desktop\Processes.txt
$Processes = 'c:\Users\me\Desktop\Processes.txt'
$Output_file = 'c:\Users\me\Desktop\Extracted.txt'
$Sorted = 'c:\Users\me\Desktop\Sorted.txt'
$Authorized = 'c:\Users\me\Desktop\Authorized.txt'
$regex = '.{1,}'
select-string -Path $Processes -Pattern $regex |% { $_.Matches } |% { $_.Value } > $Output_file
Get-Content $Output_file | Sort-Object -Unique > $Sorted
$dif = Compare-Object -ReferenceObject $(Get-Content $Sorted) -DifferenceObject $(get-content $Authorized) -IncludeEqual
$lineNumber = 1
foreach ($difference in $dif)
{
if ($difference.SideIndicator -ne "==")
{
Write-Output "The Process at Line $linenumber is an Outlier"
}
$lineNumber ++
}
Remove-Item c:\Users\me\Desktop\Processes.txt
Remove-Item c:\Users\me\Desktop\Extracted.txt
Write-Output "The Results are Stored in $Sorted"

From the length and complexity of your script, I feel like I'm missing something, but your description seems clear
Running process names:
$ProcessNames = #(Get-Process | Select-Object -ExpandProperty Name)
.. which aren't blank: $ProcessNames = $ProcessNames | Where-Object {$_ -ne ''}
List of authorised names from a file:
$AuthorizedNames = Get-Content 'c:\Users\me\Desktop\Authorized.txt'
Compare:
$UnAuthorizedNames = $ProcessNames | Where-Object { $_ -notin $AuthorizedNames }
optional output to file:
$UnAuthorizedNames | Set-Content out.txt
or in the shell:
#(gps).Name -ne '' |? { $_ -notin (gc authorized.txt) } | sc out.txt
1 2 3 4 5 6 7 8
1. #() forces something to be an array, even if it only returns one thing
2. gps is a default alias of Get-Process
3. using .Property on an array takes that property value from every item in the array
4. using an operator on an array filters the array by whether the items pass the test
5. ? is an alias of Where-Object
6. -notin tests if one item is not in a collection
7. gc is an alias of Get-Content
8. sc is an alias of Set-Content
You should use Set-Content instead of Out-File and > because it handles character encoding nicely, and they don't. And because Get-Content/Set-Content sounds like a memorable matched pair, and Get-Content/Out-File doesn't.

Related

Powershell CSV removing rows and then remove from whole file if A column matches

I've created the following small script to remove 2++ strings from a CSV.
Each row is a log of a given person and a answer they give.
The CSV has X columns.
The column named FIRST identifies the person.
What I need to do is when I delete a row matching the answer, I also need to delete the person from the whole CSV if it had one of the two strings.
What I've made so far, removes the row of people having the answers but the person is still left in the overall CSV with other answers. I want to remove the person fully if the questions have been answered.
Can somebody help me out with making the addition or changes to make this happen?
INPUT File
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
SCRIPT
$CSVfile = "C:\Temp\Test\Test.csv"
$CSVfile_filtered = "C:\Temp\Test\Test.csv"
$regex001 = "AA"
$regex002 = "BB"
$filterArray = #($regex001,$regex002)
Get-Content $CSVfile | Select-String -pattern $filterArray -notmatch | Set-Content $CSVfile_filtered
The file should then remove 10005, 10011 and both lines of 10007. But my version only removes one of the 10007 since it only matches one of the two patterns.
Using more of PowerShell's built-in cmdlets can make this a little easier to manage.
# Assuming searching only properties ADDR and ADDR2
$filter = 'AA','BB'
# Grouping by First and Last values to easily remove duplicates
# -match uses regex so | is needed for an OR of multiple items
Import-Csv Test.csv | Group-Object First,Last |
Where {!($_.Group.ADDR,$_.Group.ADDR2 -match ($filter -join '|'))} |
Foreach-Object Group |
Export-Csv output.csv -NoType
You would think strictly using text manipulation would be simpler, but it adds other scenarios to consider:
You will need to track users that have duplicate entries and potentially back track to remove them (if not grouping). This could require reading the file contents twice.
Your header row could match the string you want to filter so you will need to add it to the output if filtering removes it.
Keeping the scenarios above in mind, you can still use a grouping concept:
$filter = 'AA','BB'
$file = Get-Content Test.csv
# $file[0] is the header row
# -split string uses regex and splits at the second comma
# -split results' [0] element is First,Last values
$file[0],($file |
Select-Object -Skip 1 |
Group-Object {($_ -split '(?<=^[^,]*,[^,]*),')[0]} |
where {!($_.Group -match ($filter -join '|'))} |
Foreach-Object Group) | Set-Content output.csv
If I got it right you could do something like this:
$SearchPattern = 'AA', 'BB'
$INPUTCSV = #'
FIRST,LAST,ADDR,ADDR2,GENDER,HOME,WORK
1,N/A,N/A,N/A,N/A,BAF,N/A
10005,JAS,AA,N/A,,ZAV,N/A
10007,JADE,BB,N/A,OMA,N/A,N/A
10007,JADE,N/A,RAV,N/A,N/A,N/A
10011,KIAH,N/A,N/A,BALI,BB,N/A
'# | ConvertFrom-Csv
$ActualSearchPattern =
$INPUTCSV |
Where-Object {
$_.LAST -in $SearchPattern -or
$_.ADDR -in $SearchPattern -or
$_.ADDR2 -in $SearchPattern -or
$_.GENDER -in $SearchPattern -or
$_.HOME -in $SearchPattern -or
$_.Work -in $SearchPattern
} |
Select-Object -ExpandProperty FIRST
$INPUTCSV |
Where-Object -Property FIRST -NotIn -Value $ActualSearchPattern |
Format-Table -AutoSize
There might be more sophisticated or more elegant ways but I cannot think about one at the moment. ;-)
There is a nice PowerShell module you can use to manipulate the content of a csv or xlsx file: ImportExcel
This give you a lot of options to manipulate the sheets, columns etc.

Count unique numbers in CSV (PowerShell or Notepad++)

How to find the count of unique numbers in a CSV file? When I use the following command in PowerShell ISE
1,2,3,4,2 | Sort-Object | Get-Unique
I can get the unique numbers but I'm not able to get this to work with CSV files. If for example I use
$A = Import-Csv C:\test.csv | Sort-Object | Get-Unique
$A.Count
it returns 0. I would like to count unique numbers for all the files in a given folder.
My data looks similar to this:
Col1,Col2,Col3,Col4
5,,7,4
0,,9,
3,,5,4
And the result should be 6 unique values (preferably written inside the same CSV file).
Or would it be easier to do it with Notepad++? So far I have found examples only on how to count the unique rows.
You can try the following (PSv3+):
PS> (Import-CSV C:\test.csv |
ForEach-Object { $_.psobject.properties.value -ne '' } |
Sort-Object -Unique).Count
6
The key is to extract all property (column) values from each input object (CSV row), which is what $_.psobject.properties.value does;
-ne '' filters out empty values.
Note that, given that Sort-Object has a -Unique switch, you don't need Get-Unique (you need Get-Unique only if your input already is sorted).
That said, if your CSV file is structured as simply as yours, you can speed up processing by reading it as a text file (PSv2+):
PS> (Get-Content C:\test.csv | Select-Object -Skip 1 |
ForEach-Object { $_ -split ',' -ne '' } |
Sort-Object -Unique).Count
6
Get-Content reads the CSV file as a line of strings.
Select-Object -Skip 1 skips the header line.
$_ -split ',' -ne '' splits each line into values by commas and weeds out empty values.
As for what you tried:
Import-CSV C:\test.csv | Sort-Object | Get-Unique:
Fundamentally, Sort-Object emits the input objects as a whole (just in sorted order), it doesn't extract property values, yet that is what you need.
Because no -Property argument is passed to Sort-Object to base the sorting on, it compares the custom objects that Import-Csv emits as a whole, by their .ToString() values, which happen to be empty[1]
, so they all compare the same, and in effect no sorting happens.
Similarly, Get-Unique also determines uniqueness by .ToString() here, so that, again, all objects are considered the same and only the very first one is output.
[1] This may be surprising, given that using a custom object in an expandable string does yield a value: compare $obj = [pscustomobject] #{ foo ='bar' }; $obj.ToString(); '---'; "$obj". This inconsistency is discussed in this GitHub issue.

Remove String in one text file from another text file

I have two text files. One has all servers on the domain (servers_all.txt) and the other has only the virtual servers (virtual_servers.txt). I want to get the difference between the two so I can find the physical servers. I have been tring compare-object to no avail
Compare-Object (gc servers_all.txt) (gc .\VIRTUAL_SERVERS.TXT) -IncludeEqual
Is there a way I can remove the servers listed in virtual_servers.txt from servers_all.txt.
Both txt files are formatted the same way with only the server names in a single column ie:
ServerA
ServerB
ServerC
Compare-Object is somewhat obtuse (to put it mildly). I avoid it.
$VirtualServers = Get-Content .\VIRTUAL_SERVERS.TXT;
$NonVirtualServers = Get-Content servers_all.txt | Where-Object { $_ -notin $VirtualServers };
The only thing to beware of here is file formatting errors like leading or trailing whitespace. If you want to handle that, you could do something like:
$VirtualServers = Get-Content .\VIRTUAL_SERVERS.TXT | ForEach-Object { $_.Trim() };
$NonVirtualServers = Get-Content servers_all.txt | ForEach-Object { $_.Trim() } | Where-Object { $_ -notin $VirtualServers };
$all = 'a','b','c','d'
$virtual = 'b','c'
Compare-Object $all $virtual -PassThru
->
a
d

Issues with Powershell Import-CSV

Main Script
$Computers = Get-Content .\computers.txt
If ( test-path .\log.txt ) {
$Log_Successful = Import-CSV .\log.txt | Where-Object {$_.Result -eq "Succesful"}
} ELSE {
Add-Content "Computer Name,Is On,Attempts,Result,Time,Date"
}
$Log_Successful | format-table -autosize
Issues:
Log_Successful."Computer Name" works fine, but if i change 4 to read as the following
$Log_Successful = Import-CSV .\log.txt | Where-Object {$_.Result -eq "Failed"}
Log_Successful."Computer Name" no longer works... Any ideas why?
Dataset
Computer Name,Is On,Attempts,Result,Time,Date
52qkkgw-94210jv,False,1,Failed,9:48 AM,10/28/2012
HELLBOMBS-PC,False,1,Successful,9:48 AM,10/28/2012
52qkkgw-94210dv,False,1,Failed,9:48 AM,10/28/2012
In case of "Successful" a single object is returned. It contains the property "Computer Name". In case of "Failed" an array of two objects is returned. It (the array itself) does not contain the property "Computer Name". In PowerShell v3 in some cases it is possible to use notation $array.SomePropertyOfContainedObject but in PowerShell v2 it is an error always. That is what you probably see.
You should iterate through the array of result objects, e.g. foreach($log in $Log_Successful) {...} and access properties of the $log objects.
And the last tip. In order to ensure that the result of Import-Csv call is always an array (not null or a single object) use the #() operator.
The code after fixes would be:
$logs = #(Import-Csv ... | where ...)
# $logs is an array, e.g. you can use $logs.Count
# process logs
foreach($log in $logs) {
# use $log."Computer Name"
}
I'm not sure if this is the problem but you have a typo, in Where-Object you compare against "Succesful" and the value in the file is "Successful" (missing 's').
Anyway, what's not working?

Remove text from a dynamic file based on data in a static file

I have a powershell script that generates data that is sent to the file Dynamic.txt. The script generates a list of servers that meet very specific criteria. The list is then processed. However I have about 20 servers that meet the criteria that I do not want in the list.
This is my static list. I can remove the servers from the list using the Foreach-Object {$_ -replace "xxx", ""} command. However this is messy and I want cleaner code. How can I remove data from Dynamic.txt based on data in Static.txt?
To remove entries from one text file based on entries in another text file,
$dynamic = Get-Content .\Dynamic.txt
$static = Get-Content .\Static.txt
$dynamic| where { $static -notcontains $_ }| Set-Content .\Dynamic.txt
You could use the Compare-Object cmdlet.
The Compare-Object cmdlet compares two sets of objects. One set of
objects is the Reference set, and the other set is the Difference set.
Here's some example code.
Contents of colors.txt:
red
green
blue
pink
Contents of notcolors.txt:
green
Command and output:
compare-object (Get-Content "notcolors.txt") (Get-Content "colors.txt") | FL
InputObject : red
SideIndicator : =>
InputObject : blue
SideIndicator : =>
InputObject : pink
SideIndicator : =>
Simply selecting InputObject from the results should give you the correct list of servers.
This is powershell, there are other ways too. You could use a filter somewhere in the script that might go something like this ( you might have to switch around the $_.Name and Get-Content portions to get the logic right.)
...| Where-Object {$_.Name -notmatch (Get-Content serverlist.txt)} | ...
Say content of diff.txt should be equal to difference of fileX and fileY then use the below code
$fileA = 'fileA.txt'
$fileB = 'fileB.txt'
$diff = 'diff.txt'
$fileAContent = Get-Content $fileA -Encoding UTF8
$fileBContent = Get-Content $fileB -Encoding UTF8
$fileAContent| where { $fileBContent -notcontains $_ }| Set-Content $diff