Delete duplicate string with PowerShell - powershell

I have got text file:
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
1 2 4 5 6 7
Here first and last line are simmilar. I have a lot of files that have double lines. I need to delete all dublicate.

All these seem really complicated. It is as simple as:
gc $filename | sort | get-unique > $output
Using actual file names instead of variables:
gc test.txt| sort | get-unique > unique.txt

To get unique lines:
PS > Get-Content test.txt | Select-Object -Unique
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
To remove the duplicate
PS > Get-Content test.txt | group -noelement | `
where {$_.count -eq 1} | select -expand name
1 3 5 6 7 8
1 2 3 4 5 6

If order is not important:
Get-Content test.txt | Sort-Object -Unique | Set-Content test-1.txt
If order is important:
$set = #{}
Get-Content test.txt | %{
if (!$set.Contains($_)) {
$set.Add($_, $null)
$_
}
} | Set-Content test-2.txt

Try something like this:
$a = #{} # declare an arraylist type
gc .\mytextfile.txt | % { if (!$a.Contains($_)) { $a.add($_)}} | out-null
$a #now contains no duplicate lines
To set the content of $a to mytextfile.txt:
$a | out-file .\mytextfile.txt

$file = "C:\temp\filename.txt"
(gc $file | Group-Object | %{$_.group | select -First 1}) | Set-Content $file
The source file now contains only unique lines
The already posted options did not work for me for some reason

Related

Keep last x version of folders in every selected directory

I would like to somehow keep the last x amount of directories in the selected directory.
Ie. directory structure:
d:\test\
a
1
2
3
4
b
1
2
3
4
c
1
2
3
4
d
1
2
3
4
CASE 1: If I give this: d:\test or d:\test\* with 2 last versions then the result should be:
c
1
2
3
4
d
1
2
3
4
CASE 2: If I give this: d:\test\*\* with 2 last versions then the result should be:
a
3
4
b
3
4
c
3
4
d
3
4
CASE 3: If I give this: d:\test\*\*\* with 2 last versions then similarly to the previous case, the parents need to stay and only the subfolders need to be removed.
Until now I've found this:
Get-ChildItem -Path D:\test -Directory | Sort-Object -Property CreationTime | Select-Object -SkipLast 2 | Remove-Item
This does work for case 1, but not for cases 2 and 3.
Ok, it seems I've found a version that seems to work with all cases, but I don't know if there is a more efficient way to do this. Here is my version:
$group_dirs = Get-ChildItem -Path $path -Directory -Force | Group-Object -Property Parent
foreach ($group_dir in $group_dirs) {
$group_dir.Group | Sort-Object -Property CreationTime | Select-Object -SkipLast $leftCount | Remove-Item -Force
}

How to sort numbers stored in a file. (Ascend and Descend) powershell

Is it possible to sort numbers in a file in ascending or descending? How is it done?
Get-ChildItem |Sort-Object .\sorting.txt\ -Descending
Get-Content |Sort-Object .\sorting.txt\ -Descending
I tried all of these and even measure-object but none of them gave me what I wanted- numbers in a file sorted in ascend/descend
Consider the following
sorting.txt
9
11
45
12
3
101
Then run:
Get-Content .\sorting.txt | Sort-Object { $_ -as [int] } -Descending >> ./sorted.txt
sorted.txt
101
45
12
11
9
3

Collecting Unique Items from large data set over multiple text files

I am using PowerShell to collect lists of names from multiple text files. May of the names in these files are similar / repeating. I am trying to ensure that PowerShell returns a single text file with all of the unique items. In looking at the data it looks like the script is gathering 271/296 of the unique items. I'm guessing that some of the data is being flagged as duplicates when it shouldn't, any suggestions?
#Take content of each file (all names) and add unique values to text file
#for each unique value, create a row & check to see which txt files contain
function List {
$nofiles = Read-Host "How many files are we pulling from?"
$data = #()
for ($i = 0;$i -lt $nofiles; $i++)
{
$data += Read-Host "Give me the file name for file # $($i+1)"
}
return $data
}
function Aggregate ($array) {
Get-Content $array | Sort-Object -unique | Out-File newaggregate.txt
}
#SCRIPT BODY
$data = List
aggregate ($data)
I was expecting this code to catch everything, but it's missing some items that look very similar. List of missing names and their similar match:
CORPINZUTL16 MISSING FROM OUTFILE
CORPINZTRACE MISSING FROM OUTFILE
CORPINZADMIN Found In File
I have about 20 examples like this one. Apparently the Get-Content -Unique is not checking every character in a line. Can anyone recommend a better way of checking each line or possibly forcing the get-character to check full names?
Just for demonstration this line creates 3 txt files with numbers
for($i=1;$i -lt 4;$i++){set-content -path "$i.txt" -value ($i..$($i+7))}
1.txt | 2.txt | 3.txt | newaggregate.txt
1 | | | 1
2 | 2 | | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 5 | 5 | 5
6 | 6 | 6 | 6
7 | 7 | 7 | 7
8 | 8 | 8 | 8
| 9 | 9 | 9
| | 10 | 10
Here using Get-Content with a range [1-3] of files
Get-Content [1-3].txt | Sort-Object {[int]$_} -Unique | Out-File newaggregate.txt
$All = Get-Content .\newaggregate.txt
foreach ($file in (Get-ChildItem [1-3].txt)){
Compare-Object $All (Get-Content $file.FullName) |
Select-Object #{n='File';e={$File}},
#{n="Missing";e={$_.InputObject}} -ExcludeProperty SideIndicator
}
File Missing
---- -------
Q:\Test\2019\05\07\1.txt 9
Q:\Test\2019\05\07\1.txt 10
Q:\Test\2019\05\07\2.txt 1
Q:\Test\2019\05\07\2.txt 10
Q:\Test\2019\05\07\3.txt 1
Q:\Test\2019\05\07\3.txt 2
there are two ways to achieve this one is using select-object -Unique which works when data is not sorted and can be used for small data or lists.
When dealing with large files we can use get-Unique command which works with sorted input, if input data is not sorted then it will give wrong results.
Get-ChildItem *.txt | Get-Content | measure -Line #225949
Get-ChildItem *.txt | Get-Content | sort | Get-Unique | measure -Line #119650
Here is my command for multiple files :
Get-ChildItem *.txt | Get-Content | sort | Get-Unique >> Unique.txt

How to transpose one column (multiple rows) to multiple columns?

I have a file with the below contents:
0
ABC
1
181.12
2
05/07/16
3
1002
4
1211511108
6
1902
7
1902
10
hello
-1
0
ABC
1
1333.21
2
02/02/16
3
1294
4
1202514258
6
1294
7
1294
10
HAI
-1
...
I want to transpose the above file contents like below. The '-1' in above lists is the record separator which indicates the start of the next record.
ABC,181.12,05/07/16,1002,1211511108,1902,1902,hello
ABC,1333.21,02/02/16,1294,1202514258,1294,1294,HAI
...
Please let me know how to achieve this.
Read the file as a single string:
$txt = Get-Content 'C:\path\to\your.txt' | Out-String
Split the content at -1 lines:
$txt -split '(?m)^-1\r?\n'
Split each block at line breaks:
... | ForEach-Object {
$arr = $_ -split '\r?\n'
}
Select the values at odd indexes (skip the number lines) and join them by commas:
$indexes = 1..$($arr.Count - 1) | Where-Object { ($_ % 2) -ne 0 }
$arr[$indexes] -join ','

Count line lengths in file using powershell

If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?
Example:
this
is
a
sample
file
with
several
lines
of
varying
length
Output:
Length Occurences
1 1
2 2
4 3
5 1
6 2
7 2
Have you got any ideas?
For high-volume work, use Get-Content with -ReadCount
$ht = #{}
Get-Content <file> -ReadCount 1000 |
foreach {
foreach ($line in $_)
{$ht[$line.length]++}
}
$ht.GetEnumerator() | sort Name
How about
get-content <file> | Group-Object -Property Length | sort -Property Name
Depending on how long your file is, you may want to do something more efficient