Delete duplicate string with PowerShell

Delete duplicate string with PowerShell - powershell

I have got text file:
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
1 2 4 5 6 7
Here first and last line are simmilar. I have a lot of files that have double lines. I need to delete all dublicate.

All these seem really complicated. It is as simple as:
gc $filename | sort | get-unique > $output
Using actual file names instead of variables:
gc test.txt| sort | get-unique > unique.txt

To get unique lines:
PS > Get-Content test.txt | Select-Object -Unique
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
To remove the duplicate
PS > Get-Content test.txt | group -noelement | `
where {$_.count -eq 1} | select -expand name
1 3 5 6 7 8
1 2 3 4 5 6

If order is not important:
Get-Content test.txt | Sort-Object -Unique | Set-Content test-1.txt
If order is important:
$set = #{}
Get-Content test.txt | %{
if (!$set.Contains($_)) {
$set.Add($_, $null)
$_
}
} | Set-Content test-2.txt

Try something like this:
$a = #{} # declare an arraylist type
gc .\mytextfile.txt | % { if (!$a.Contains($_)) { $a.add($_)}} | out-null
$a #now contains no duplicate lines
To set the content of $a to mytextfile.txt:
$a | out-file .\mytextfile.txt

$file = "C:\temp\filename.txt"
(gc $file | Group-Object | %{$_.group | select -First 1}) | Set-Content $file
The source file now contains only unique lines
The already posted options did not work for me for some reason

Related

Keep last x version of folders in every selected directory

I would like to somehow keep the last x amount of directories in the selected directory.
Ie. directory structure:
d:\test\
a
1
2
3
4
b
1
2
3
4
c
1
2
3
4
d
1
2
3
4
CASE 1: If I give this: d:\test or d:\test\* with 2 last versions then the result should be:
c
1
2
3
4
d
1
2
3
4
CASE 2: If I give this: d:\test\*\* with 2 last versions then the result should be:
a
3
4
b
3
4
c
3
4
d
3
4
CASE 3: If I give this: d:\test\*\*\* with 2 last versions then similarly to the previous case, the parents need to stay and only the subfolders need to be removed.
Until now I've found this:
Get-ChildItem -Path D:\test -Directory | Sort-Object -Property CreationTime | Select-Object -SkipLast 2 | Remove-Item
This does work for case 1, but not for cases 2 and 3.

Ok, it seems I've found a version that seems to work with all cases, but I don't know if there is a more efficient way to do this. Here is my version:
$group_dirs = Get-ChildItem -Path $path -Directory -Force | Group-Object -Property Parent
foreach ($group_dir in $group_dirs) {
$group_dir.Group | Sort-Object -Property CreationTime | Select-Object -SkipLast $leftCount | Remove-Item -Force
}

How to sort numbers stored in a file. (Ascend and Descend) powershell

Is it possible to sort numbers in a file in ascending or descending? How is it done?
Get-ChildItem |Sort-Object .\sorting.txt\ -Descending
Get-Content |Sort-Object .\sorting.txt\ -Descending
I tried all of these and even measure-object but none of them gave me what I wanted- numbers in a file sorted in ascend/descend

Consider the following
sorting.txt
9
11
45
12
3
101
Then run:
Get-Content .\sorting.txt | Sort-Object { $_ -as [int] } -Descending >> ./sorted.txt
sorted.txt
101
45
12
11
9
3

Collecting Unique Items from large data set over multiple text files

I am using PowerShell to collect lists of names from multiple text files. May of the names in these files are similar / repeating. I am trying to ensure that PowerShell returns a single text file with all of the unique items. In looking at the data it looks like the script is gathering 271/296 of the unique items. I'm guessing that some of the data is being flagged as duplicates when it shouldn't, any suggestions?
#Take content of each file (all names) and add unique values to text file
#for each unique value, create a row & check to see which txt files contain
function List {
$nofiles = Read-Host "How many files are we pulling from?"
$data = #()
for ($i = 0;$i -lt $nofiles; $i++)
{
$data += Read-Host "Give me the file name for file # $($i+1)"
}
return $data
}
function Aggregate ($array) {
Get-Content $array | Sort-Object -unique | Out-File newaggregate.txt
}
#SCRIPT BODY
$data = List
aggregate ($data)
I was expecting this code to catch everything, but it's missing some items that look very similar. List of missing names and their similar match:
CORPINZUTL16 MISSING FROM OUTFILE
CORPINZTRACE MISSING FROM OUTFILE
CORPINZADMIN Found In File
I have about 20 examples like this one. Apparently the Get-Content -Unique is not checking every character in a line. Can anyone recommend a better way of checking each line or possibly forcing the get-character to check full names?

Just for demonstration this line creates 3 txt files with numbers
for($i=1;$i -lt 4;$i++){set-content -path "$i.txt" -value ($i..$($i+7))}
1.txt | 2.txt | 3.txt | newaggregate.txt
1 | | | 1
2 | 2 | | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 5 | 5 | 5
6 | 6 | 6 | 6
7 | 7 | 7 | 7
8 | 8 | 8 | 8
| 9 | 9 | 9
| | 10 | 10
Here using Get-Content with a range [1-3] of files
Get-Content [1-3].txt | Sort-Object {[int]$_} -Unique | Out-File newaggregate.txt
$All = Get-Content .\newaggregate.txt
foreach ($file in (Get-ChildItem [1-3].txt)){
Compare-Object $All (Get-Content $file.FullName) |
Select-Object #{n='File';e={$File}},
#{n="Missing";e={$_.InputObject}} -ExcludeProperty SideIndicator
}
File Missing
---- -------
Q:\Test\2019\05\07\1.txt 9
Q:\Test\2019\05\07\1.txt 10
Q:\Test\2019\05\07\2.txt 1
Q:\Test\2019\05\07\2.txt 10
Q:\Test\2019\05\07\3.txt 1
Q:\Test\2019\05\07\3.txt 2

there are two ways to achieve this one is using select-object -Unique which works when data is not sorted and can be used for small data or lists.
When dealing with large files we can use get-Unique command which works with sorted input, if input data is not sorted then it will give wrong results.
Get-ChildItem *.txt | Get-Content | measure -Line #225949
Get-ChildItem *.txt | Get-Content | sort | Get-Unique | measure -Line #119650
Here is my command for multiple files :
Get-ChildItem *.txt | Get-Content | sort | Get-Unique >> Unique.txt

How to transpose one column (multiple rows) to multiple columns?

I have a file with the below contents:
0
ABC
1
181.12
2
05/07/16
3
1002
4
1211511108
6
1902
7
1902
10
hello
-1
0
ABC
1
1333.21
2
02/02/16
3
1294
4
1202514258
6
1294
7
1294
10
HAI
-1
...
I want to transpose the above file contents like below. The '-1' in above lists is the record separator which indicates the start of the next record.
ABC,181.12,05/07/16,1002,1211511108,1902,1902,hello
ABC,1333.21,02/02/16,1294,1202514258,1294,1294,HAI
...
Please let me know how to achieve this.

Read the file as a single string:
$txt = Get-Content 'C:\path\to\your.txt' | Out-String
Split the content at -1 lines:
$txt -split '(?m)^-1\r?\n'
Split each block at line breaks:
... | ForEach-Object {
$arr = $_ -split '\r?\n'
}
Select the values at odd indexes (skip the number lines) and join them by commas:
$indexes = 1..$($arr.Count - 1) | Where-Object { ($_ % 2) -ne 0 }
$arr[$indexes] -join ','

Count line lengths in file using powershell

If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?
Example:
this
is
a
sample
file
with
several
lines
of
varying
length
Output:
Length Occurences
1 1
2 2
4 3
5 1
6 2
7 2
Have you got any ideas?

For high-volume work, use Get-Content with -ReadCount
$ht = #{}
Get-Content <file> -ReadCount 1000 |
foreach {
foreach ($line in $_)
{$ht[$line.length]++}
}
$ht.GetEnumerator() | sort Name

How about
get-content <file> | Group-Object -Property Length | sort -Property Name
Depending on how long your file is, you may want to do something more efficient

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Delete duplicate string with PowerShell - powershell

I have got text file: 1 2 4 5 6 7 1 3 5 6 7 8 1 2 3 4 5 6 1 2 4 5 6 7 Here first and last line are simmilar. I have a lot of files that have double lines. I need to delete all dublicate.

All these seem really complicated. It is as simple as: gc $filename | sort | get-unique > $output Using actual file names instead of variables: gc test.txt| sort | get-unique > unique.txt

To get unique lines: PS > Get-Content test.txt | Select-Object -Unique 1 2 4 5 6 7 1 3 5 6 7 8 1 2 3 4 5 6 To remove the duplicate PS > Get-Content test.txt | group -noelement | ` where {$_.count -eq 1} | select -expand name 1 3 5 6 7 8 1 2 3 4 5 6

If order is not important: Get-Content test.txt | Sort-Object -Unique | Set-Content test-1.txt If order is important: $set = #{} Get-Content test.txt | %{ if (!$set.Contains($_)) { $set.Add($_, $null) $_ } } | Set-Content test-2.txt

Try something like this: $a = #{} # declare an arraylist type gc .\mytextfile.txt | % { if (!$a.Contains($_)) { $a.add($_)}} | out-null $a #now contains no duplicate lines To set the content of $a to mytextfile.txt: $a | out-file .\mytextfile.txt

$file = "C:\temp\filename.txt" (gc $file | Group-Object | %{$_.group | select -First 1}) | Set-Content $file The source file now contains only unique lines The already posted options did not work for me for some reason

Related

Keep last x version of folders in every selected directory

How to sort numbers stored in a file. (Ascend and Descend) powershell

Collecting Unique Items from large data set over multiple text files

How to transpose one column (multiple rows) to multiple columns?

Count line lengths in file using powershell

Categories

Resources