how to find unique line in a txt file?

how to find unique line in a txt file? - powershell

I have a LARGE list of hashes. I need to find out which ones only appear once as most are duplicates.
EX: the last line 238db2..... only appears once
ac6b51055fdac5b92934699d5b07db78
ac6b51055fdac5b92934699d5b07db78
7f5417a85a63967d8bba72496faa997a
7f5417a85a63967d8bba72496faa997a
1e78ba685a4919b7cf60a5c60b22ebc2
1e78ba685a4919b7cf60a5c60b22ebc2
238db202693284f7e8838959ba3c80e8
I tried the following that just listed one of each of the doubles, not just identifying the one that only appeared once
foreach ($line in (Get-Content "C:\hashes.txt" | Select-Object -Unique)) {
Write-Host "Line '$line' appears $(($line | Where-Object {$_ -eq $line}).count) time(s)."
}

You could use a Hashtable and a StreamReader.
The StreamReader reads the file line-by-line and the Hashtable will store that line as Key and in its Value state $true (if this is a duplicate) or $false (if it is unique)
$reader = [System.IO.StreamReader]::new('D:\Test\hashes.txt')
$hash = #{}
while($null -ne ($line = $reader.ReadLine())) {
$hash[$line] = $hash.ContainsKey($line)
}
# clean-up the StreamReader
$reader.Dispose()
# get the unique line(s) by filtering for value $false
$result = $hash.Keys | Where-Object {-not $hash[$_]}
Given your example, $result will contain 238db202693284f7e8838959ba3c80e8

Given that you're dealing with a large file, Get-Content is best avoided.
A switch statement with the -File parameter allows efficient line-by-line processing, and given that duplicates appear to be grouped together already, they can be detected by keeping a running count of identical lines.
$count = 0 # keeps track of the count of identical lines occurring in sequence
switch -File 'C:\hashes.txt' {
default {
if ($prevLine -eq $_ -or $count -eq 0) { # duplicate or first line.
if ($count -eq 0) { $prevLine = $_ }
++$count
}
else { # current line differs from the previous one.
if ($count -eq 1) { $prevLine } # non-duplicate -> output
$prevLine = $_
$count = 1
}
}
}
if ($count -eq 1) { $prevLine } # output the last line, if a non-duplicate.

$values = Get-Content .\hashes.txt # Read the values from the hashes.txt file
$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 } # Group the values by their distinct values and filter for groups with a single value
foreach ($group in $groups) {
foreach ($value in $group.Values) {
Write-Host "$value" # Output the value of each group
}
}
To handle very large files you could try this.
$chunkSize = 1000 # Set the chunk size to 1000 lines
$lineNumber = 0 # Initialize a line number counter
# Use a do-while loop to read the file in chunks
do {
# Read the next chunk of lines from the file
$values = Get-Content .\hashes.txt | Select-Object -Skip $lineNumber -First $chunkSize
# Group the values by their distinct values and filter for groups with a single value
$groups = $values | Group-Object | Where-Object { $_.Count -eq 1 }
foreach ($group in $groups) {
foreach ($value in $group.Values) {
Write-Host "$value" # Output the value of each group
}
}
# Increment the line number counter by the chunk size
$lineNumber += $chunkSize
} while ($values.Count -eq $chunkSize)
Or this
# Create an empty dictionary
$dict = New-Object System.Collections.Hashtable
# Read the file line by line
foreach ($line in Get-Content .\hashes.txt) {
# Check if the line is already in the dictionary
if ($dict.ContainsKey($line)) {
# Increment the value of the line in the dictionary
$dict.Item($line) += 1
} else {
# Add the line to the dictionary with a count of 1
$dict.Add($line, 1)
}
}
# Filter the dictionary for values with a count of 1
$singles = $dict.GetEnumerator() | Where-Object { $_.Value -eq 1 }
# Output the values of the single items
foreach ($single in $singles) {
Write-Host $single.Key
}

Related

Comparing two text files and output the differences in Powershell

So I'm new to the Powershell scripting world and I'm trying to compare a list of IPs in text file against a database of IP list. If an IP from (file) does not exist in the (database) file put it in a new file, let's call it compared.txt. When I tried to run the script, I didn't get any result. What am I missing here?
$file = Get-Content "C:\Users\zack\Desktop\file.txt"
$database = Get-Content "C:\Users\zack\Desktop\database.txt"
foreach($line1 in $file){
$check = 0
foreach($line2 in $database)
{
if($line1 != $line2)
{
$check = 1
}
else
{
$check = 0
break
}
}
if ($check == 1 )
{
$line2 | Out-File "C:\Users\zack\Desktop\compared.txt"
}
}

There is a problem with your use of PowerShell comparison operators unlike in C#, equality and inequality are -eq and -ne, and since PowerShell is a case insensitive language, there is also -ceq and -cne.
There is also a problem with your code's logic, a simple working version of it would be:
$database = Get-Content "C:\Users\zack\Desktop\database.txt"
# iterate each line in `file.txt`
$result = foreach($line1 in Get-Content "C:\Users\zack\Desktop\file.txt") {
# iterate each line in `database.txt`
# this happens on each iteration of the outer loop
$check = foreach($line2 in $database) {
# if this line of `file.txt` is the same as this line of `database.txt`
if($line1 -eq $line2) {
# we don't need to keep checking, output this boolean
$true
# and break the inner loop
break
}
}
# if above condition was NOT true
if(-not $check) {
# output this line, can be `$line1` or `$line2` (same thing here)
$line1
}
}
$result | Set-Content path\to\comparisonresult.txt
However, there are even more simplified ways you could achieve the same results:
Using containment operators:
$database = Get-Content "C:\Users\zack\Desktop\database.txt"
$result = foreach($line1 in Get-Content "C:\Users\zack\Desktop\file.txt") {
if($line1 -notin $database) {
$line1
}
}
$result | Set-Content path\to\comparisonresult.txt
Using Where-Object:
$database = Get-Content "C:\Users\zack\Desktop\database.txt"
Get-Content "C:\Users\zack\Desktop\file.txt" | Where-Object { $_ -notin $database } |
Set-Content path\to\comparisonresult.txt
Using a HashSet<T> and it's ExceptWith method (Note, this will also get rid of duplicates in your file.txt):
$file = [System.Collections.Generic.HashSet[string]]#(
Get-Content "C:\Users\zack\Desktop\file.txt"
)
$database = [string[]]#(Get-Content "C:\Users\zack\Desktop\database.txt")
$file.ExceptWith($database)
$file | Set-Content path\to\comparisonresult.txt

PowerShell : Return keyword inside Foreach block not returning line of command to caller but acts as continue [duplicate]

I have the following code:
$project.PropertyGroup | Foreach-Object {
if($_.GetAttribute('Condition').Trim() -eq $propertyGroupConditionName.Trim()) {
$a = $project.RemoveChild($_);
Write-Host $_.GetAttribute('Condition')"has been removed.";
}
};
Question #1: How do I exit from ForEach-Object? I tried using "break" and "continue", but it doesn't work.
Question #2: I found that I can alter the list within a foreach loop... We can't do it like that in C#... Why does PowerShell allow us to do that?

First of all, Foreach-Object is not an actual loop and calling break in it will cancel the whole script rather than skipping to the statement after it.
Conversely, break and continue will work as you expect in an actual foreach loop.
Item #1. Putting a break within the foreach loop does exit the loop, but it does not stop the pipeline. It sounds like you want something like this:
$todo=$project.PropertyGroup
foreach ($thing in $todo){
if ($thing -eq 'some_condition'){
break
}
}
Item #2. PowerShell lets you modify an array within a foreach loop over that array, but those changes do not take effect until you exit the loop. Try running the code below for an example.
$a=1,2,3
foreach ($value in $a){
Write-Host $value
}
Write-Host $a
I can't comment on why the authors of PowerShell allowed this, but most other scripting languages (Perl, Python and shell) allow similar constructs.

There are differences between foreach and foreach-object.
A very good description you can find here: MS-ScriptingGuy
For testing in PS, here you have scripts to show the difference.
ForEach-Object:
# Omit 5.
1..10 | ForEach-Object {
if ($_ -eq 5) {return}
# if ($_ -ge 5) {return} # Omit from 5.
Write-Host $_
}
write-host "after1"
# Cancels whole script at 15, "after2" not printed.
11..20 | ForEach-Object {
if ($_ -eq 15) {continue}
Write-Host $_
}
write-host "after2"
# Cancels whole script at 25, "after3" not printed.
21..30 | ForEach-Object {
if ($_ -eq 25) {break}
Write-Host $_
}
write-host "after3"
foreach
# Ends foreach at 5.
foreach ($number1 in (1..10)) {
if ($number1 -eq 5) {break}
Write-Host "$number1"
}
write-host "after1"
# Omit 15.
foreach ($number2 in (11..20)) {
if ($number2 -eq 15) {continue}
Write-Host "$number2"
}
write-host "after2"
# Cancels whole script at 25, "after3" not printed.
foreach ($number3 in (21..30)) {
if ($number3 -eq 25) {return}
Write-Host "$number3"
}
write-host "after3"

To stop the pipeline of which ForEach-Object is part just use the statement continue inside the script block under ForEach-Object. continue behaves differently when you use it in foreach(...) {...} and in ForEach-Object {...} and this is why it's possible. If you want to carry on producing objects in the pipeline discarding some of the original objects, then the best way to do it is to filter out using Where-Object.

Since ForEach-Object is a cmdlet, break and continue will behave differently here than with the foreach keyword. Both will stop the loop but will also terminate the entire script:
break:
0..3 | foreach {
if ($_ -eq 2) { break }
$_
}
echo "Never printed"
# OUTPUT:
# 0
# 1
continue:
0..3 | foreach {
if ($_ -eq 2) { continue }
$_
}
echo "Never printed"
# OUTPUT:
# 0
# 1
So far, I have not found a "good" way to break a foreach script block without breaking the script, except "abusing" exceptions, although powershell core uses this approach:
throw:
class CustomStopUpstreamException : Exception {}
try {
0..3 | foreach {
if ($_ -eq 2) { throw [CustomStopUpstreamException]::new() }
$_
}
} catch [CustomStopUpstreamException] { }
echo "End"
# OUTPUT:
# 0
# 1
# End
The alternative (which is not always possible) would be to use the foreach keyword:
foreach:
foreach ($_ in (0..3)) {
if ($_ -eq 2) { break }
$_
}
echo "End"
# OUTPUT:
# 0
# 1
# End

If you insist on using ForEach-Object, then I would suggest adding a "break condition" like this:
$Break = $False;
1,2,3,4 | Where-Object { $Break -Eq $False } | ForEach-Object {
$Break = $_ -Eq 3;
Write-Host "Current number is $_";
}
The above code must output 1,2,3 and then skip (break before) 4. Expected output:
Current number is 1
Current number is 2
Current number is 3

Below is a suggested approach to Question #1 which I use if I wish to use the ForEach-Object cmdlet.
It does not directly answer the question because it does not EXIT the pipeline.
However, it may achieve the desired effect in Q#1.
The only drawback an amateur like myself can see is when processing large pipeline iterations.
$zStop = $false
(97..122) | Where-Object {$zStop -eq $false} | ForEach-Object {
$zNumeric = $_
$zAlpha = [char]$zNumeric
Write-Host -ForegroundColor Yellow ("{0,4} = {1}" -f ($zNumeric, $zAlpha))
if ($zAlpha -eq "m") {$zStop = $true}
}
Write-Host -ForegroundColor Green "My PSVersion = 5.1.18362.145"
I hope this is of use.
Happy New Year to all.

There is a way to break from ForEach-Object without throwing an exception. It employs a lesser-known feature of Select-Object, using the -First parameter, which actually breaks the pipeline when the specified number of pipeline items have been processed.
Simplified example:
$null = 1..5 | ForEach-Object {
# Do something...
Write-Host $_
# Evaluate "break" condition -> output $true
if( $_ -eq 2 ) { $true }
} | Select-Object -First 1 # Actually breaks the pipeline
Output:
1
2
Note that the assignment to $null is there to hide the output of $true, which is produced by the break condition. The value $true could be replaced by 42, "skip", "foobar", you name it. We just need to pipe something to Select-Object so it breaks the pipeline.

I found this question while looking for a way to have fine grained flow control to break from a specific block of code. The solution I settled on wasn't mentioned...
Using labels with the break keyword
From: about_break
A Break statement can include a label that lets you exit embedded
loops. A label can specify any loop keyword, such as Foreach, For, or
While, in a script.
Here's a simple example
:myLabel for($i = 1; $i -le 2; $i++) {
Write-Host "Iteration: $i"
break myLabel
}
Write-Host "After for loop"
# Results:
# Iteration: 1
# After for loop
And then a more complicated example that shows the results with nested labels and breaking each one.
:outerLabel for($outer = 1; $outer -le 2; $outer++) {
:innerLabel for($inner = 1; $inner -le 2; $inner++) {
Write-Host "Outer: $outer / Inner: $inner"
#break innerLabel
#break outerLabel
}
Write-Host "After Inner Loop"
}
Write-Host "After Outer Loop"
# Both breaks commented out
# Outer: 1 / Inner: 1
# Outer: 1 / Inner: 2
# After Inner Loop
# Outer: 2 / Inner: 1
# Outer: 2 / Inner: 2
# After Inner Loop
# After Outer Loop
# break innerLabel Results
# Outer: 1 / Inner: 1
# After Inner Loop
# Outer: 2 / Inner: 1
# After Inner Loop
# After Outer Loop
# break outerLabel Results
# Outer: 1 / Inner: 1
# After Outer Loop
You can also adapt it to work in other situations by wrapping blocks of code in loops that will only execute once.
:myLabel do {
1..2 | % {
Write-Host "Iteration: $_"
break myLabel
}
} while ($false)
Write-Host "After do while loop"
# Results:
# Iteration: 1
# After do while loop

You have two options to abruptly exit out of ForEach-Object pipeline in PowerShell:
Apply exit logic in Where-Object first, then pass objects to Foreach-Object, or
(where possible) convert Foreach-Object into a standard Foreach looping construct.
Let's see examples: Following scripts exit out of Foreach-Object loop after 2nd iteration (i.e. pipeline iterates only 2 times)":
Solution-1: use Where-Object filter BEFORE Foreach-Object:
[boolean]$exit = $false;
1..10 | Where-Object {$exit -eq $false} | Foreach-Object {
if($_ -eq 2) {$exit = $true} #OR $exit = ($_ -eq 2);
$_;
}
OR
1..10 | Where-Object {$_ -le 2} | Foreach-Object {
$_;
}
Solution-2: Converted Foreach-Object into standard Foreach looping construct:
Foreach ($i in 1..10) {
if ($i -eq 3) {break;}
$i;
}
PowerShell should really provide a bit more straightforward way to exit or break out from within the body of a Foreach-Object pipeline. Note: return doesn't exit, it only skips specific iteration (similar to continue in most programming languages), here is an example of return:
Write-Host "Following will only skip one iteration (actually iterates all 10 times)";
1..10 | Foreach-Object {
if ($_ -eq 3) {return;} #skips only 3rd iteration.
$_;
}
HTH

Answer for Question #1 -
You could simply have your if statement stop being TRUE
$project.PropertyGroup | Foreach {
if(($_.GetAttribute('Condition').Trim() -eq $propertyGroupConditionName.Trim()) -and !$FinishLoop) {
$a = $project.RemoveChild($_);
Write-Host $_.GetAttribute('Condition')"has been removed.";
$FinishLoop = $true
}
};

Merging counting unique paths in powershell

I am trying to figure out a way to merge all the parents directory paths into on e path, so imagine I have this data in a txt file:
\BANANA\APPLE\BERRIES\GRAPES\
\BANANA\APPLE\BERRIES\
\BANANA\APPLE\BERRIES\GRAPES\PEACH\
\BANANA\APPLE\
\BANANA\
\BANANA\APPLE\BERRIES\GRAPES\PEACH\AVOCADO\
I want the output of my loop to be just:
\BANANA\APPLE\BERRIES\GRAPES\PEACH\AVOCADO\
Because it is the longest path containing all the other previous paths.
But I am trying to do a loop for all the unique paths in a file containing all the previous parent folders as follows:
rm UNIQUE_PATHS.txt
#"LINE:"+$line
$count=0
foreach ($line in gc COUNT_DIR.txt){
foreach ($line2 in gc COUNT_DIR.txt){
# $line -contains $checking
if ($line2.contains($line2)) {
"COMPARING:"+$line2+" AND "+$line
$count = $count+1
}
if ($count -eq 1){
$line+$count >> UNIQUE_PATHS.txt
}
}
}
cat UNIQUE_PATHS.txt
So looks my count of the unique path is not working, that should be a better script for this ?

like this?
$Content=get-content "C:\temp\COUNT_DIR.txt"
$Content | %{
$Current=$_
$Founded= $Content | where {$_ -ne $Current -and $_.contains($Current)} | select -First 1
if($Founded -eq $null)
{
$Current
}
}

Why Isn't This Counting Correctly | PowerShell

Right now, I have a CSV file which contains 3,800+ records. This file contains a list of server names, followed by an abbreviation stating if the server is a Windows server, Linux server, etc. The file also contains comments or documentation, where each line starts with "#", stating it is a comment. What I have so far is as follows.
$file = Get-Content .\allsystems.csv
$arraysplit = #()
$arrayfinal = #()
[int]$windows = 0
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing.Split(":")
$arrayfinal = #($arraysplit[0], $arraysplit[1])
}
}
foreach ($item in $arrayfinal){
if ($item[1] -contains 'NT'){
$windows++
}
else {
continue
}
}
$windows
The goal of this script is to count the total number of Windows servers. My issue is that the first "foreach" block works fine, but the second one results in "$Windows" being 0. I'm honestly not sure why this isn't working. Two example lines of data are as follows:
example:LNX
example2:NT

if the goal is to count the windows servers, why do you need the array?
can't you just say something like
foreach ($thing in $file)
{
if ($thing -notmatch "^#" -and $thing -match "NT") { $windows++ }
}

$arrayfinal = #($arraysplit[0], $arraysplit[1])
This replaces the array for every run.
Changing it to += gave another issue. It simply appended each individual element. I used this post's info to fix it, sort of forcing a 2d array: How to create array of arrays in powershell?.
$file = Get-Content .\allsystems.csv
$arraysplit = #()
$arrayfinal = #()
[int]$windows = 0
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing.Split(":")
$arrayfinal += ,$arraysplit
}
}
foreach ($item in $arrayfinal){
if ($item[1] -contains 'NT'){
$windows++
}
else {
continue
}
}
$windows
1
I also changed the file around and added more instances of both NT and other random garbage. Seems it works fine.

I'd avoid making another ForEach loop for bumping count occurrences. Your $arrayfinal also rewrites everytime, so I used ArrayList.
$file = Get-Content "E:\Code\PS\myPS\2018\Jun\12\allSystems.csv"
$arrayFinal = New-Object System.Collections.ArrayList($null)
foreach ($thing in $file){
if ($thing.StartsWith("#")) {
continue
}
else {
$arraysplit = $thing -split ":"
if($arraysplit[1] -match "NT" -or $arraysplit[1] -match "Windows")
{
$arrayfinal.Add($arraysplit[1]) | Out-Null
}
}
}
Write-Host "Entries with 'NT' or 'Windows' $($arrayFinal.Count)"
I'm not sure if you want to keep 'Example', 'example2'... so I have skipped adding them to arrayfinal, assuming the goal is to count "NT" or "Windows" occurrances

The goal of this script is to count the total number of Windows servers.
I'd suggest the easy way: using cmdlets built for this.
$csv = Get-Content -Path .\file.csv |
Where-Object { -not $_.StartsWith('#') } |
ConvertFrom-Csv
#($csv.servertype).Where({ $_.Equals('NT') }).Count
# Compatibility mode:
# ($csv.servertype | Where-Object { $_.Equals('NT') }).Count
Replace servertype and 'NT' with whatever that header/value is called.

Transpose rows to columns in PowerShell

I have a source file with the below contents:
0
ABC
1
181.12
2
05/07/16
4
Im4thData
5
hello
-1
0
XYZ
1
1333.21
2
02/02/16
3
Im3rdData
5
world
-1
...
The '-1' in above lists is the record separator which indicates the start of the next record. 0,1,2,3,4,5 etc are like column identifiers (or column names).
This is my code below.
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
$arr = $_ -split '\r?\n'
$indexes = 1..$($arr.Count - 1) | Where-Object { ($_ % 2) -ne 0 }
$arr[$indexes] -join '|'
}
The above code creates output like below:
ABC|181.12|05/07/16|Im4thData|hello
XYZ|1333.21|02/02/16|Im3rdData|World
...
But I need output like below. When there are no columns in the source file, then their row data should have blank pipe line (||) like below in the output file. Please advise the change needed in the code.
ABC|181.12|05/07/16||Im4thData|hello ← There is no 3rd column in the source file. so blank pipe line (||).
XYZ|1333.21|02/02/16|Im3rdData||World ← There is no 4th column column in the source file. so blank pipe line (||).
...

If you know the maximum number of columns beforehand you could do something like this:
$cols = 6
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
# initialize array of required size
$row = ,$null * $cols
$arr = $_ -split '\r?\n'
for ($n = 0; $n -lt $arr.Count; $n += 2) {
$i = [int]$arr[$n]
$row[$i] = $arr[$n+1]
}
$row -join '|'
}
Otherwise you could do something like this:
$txt = Get-Content 'C:myfile.txt' | Out-String
$txt -split '(?m)^-1\r?\n' | ForEach-Object {
# create empty array
$row = #()
$arr = $_ -split '\r?\n'
$k = 0
for ($n = 0; $n -lt $arr.Count; $n += 2) {
$i = [int]$arr[$n]
# if index from record ($i) is greater than current index ($k) append
# required number of empty fields
for ($j = $k; $j -lt $i-1; $j++) { $row += $null }
$row += $arr[$n+1]
$k = $i
}
$row -join '|'
}

Needs quite a bit of processing. There might be a more efficient way to do this, but the below does work.
$c = Get-Content ".\file.txt"
$rdata = #{}
$data = #()
$i = 0
# Parse the file into an array of key-value pairs
while ($i -lt $c.count) {
if($c[$i].trim() -eq '-1') {
$data += ,$rdata
$rdata = #{}
$i++
continue
}
$field = $c[$i].trim()
$value = $c[++$i].trim()
$rdata[$field] = $value
$i++
}
# Check if there are any missing values between 0 and the highest value and set to empty string if so
foreach ($row in $data) {
$top = [int]$($row.GetEnumerator() | Sort-Object Name -descending | select -First 1 -ExpandProperty Name)
for($i = 0; $i -lt $top; $i++) {
if ($row["$i"] -eq $null) {
$row["$i"] = ""
}
}
}
# Sort each hash by field order and join with pipe
$data | ForEach-Object { ($_.GetEnumerator() | Sort-Object -property Name | Select-Object -ExpandProperty Value) -join '|' }
In the while loop, we are just iterating over each line of the file. The field number an value are separated by a value of one, so each iteration we take both values and add them to the hash.
If we encounter -1 then we know we have a record separator, so add the hash to an array, reset it, bump the counter to the next record and continue to the next iteration.
Once we've collected everything we need to check if there are any missing field values, so we grab the highest number from each hash, loop over it from 0 and fill any missing values with an empty string.
Once that is done you can then iterate the array, sort each hash by field number and join the values.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

how to find unique line in a txt file? - powershell

Related

Comparing two text files and output the differences in Powershell

PowerShell : Return keyword inside Foreach block not returning line of command to caller but acts as continue [duplicate]

Merging counting unique paths in powershell

Why Isn't This Counting Correctly | PowerShell

Transpose rows to columns in PowerShell

Categories

Resources