Replace white spaces in strings in a pipe in PowerShell - powershell

I have the following in PowerShell:
$objects = git -C "$localRepoPath" verify-pack -v (Get-ChildItem "$PackFiles\pack-*.idx")`
| Select-String -Pattern "blob"`
| Select -First 5
# Output each line
$objects | % {
$_.Line
}
When I run the above I get:
1ff0c423042b46cb1d617b81efb715defbe8054d blob 2518 701 5081449
dc9a583bf4927685d9ceb3f9381b6a232a609281 blob 2390 1203 5082150
876f3b37e51d0af2d090104e22352171eca12bff blob 11 40 5083353 1 dc9a583bf4927685d9ceb3f9381b6a232a609281
3faa5832a9e087a5d4cf2b493c52f97eda478242 blob 21 51 5083393 2 876f3b37e51d0af2d090104e22352171eca12bff
15a0d10cde33f1d3fe5c2acda9fc1eb52def6dc6 blob 62 92 5083444 3 3faa5832a9e087a5d4cf2b493c52f97eda478242
I would like to replace one or more of the white spaces in the above output with '___' in the same pipe. I have tried:
$objects = git -C "$localRepoPath" verify-pack -v (Get-ChildItem "$PackFiles\pack-*.idx")`
| Select-String -Pattern "blob"`
| % { $_.Line -replace '\s+', '___' }`
| Select -First 5
# Output each line
$objects | % {
$_.Line
}
But the output is empty. How do I accomplish that?

Related

Collecting Unique Items from large data set over multiple text files

I am using PowerShell to collect lists of names from multiple text files. May of the names in these files are similar / repeating. I am trying to ensure that PowerShell returns a single text file with all of the unique items. In looking at the data it looks like the script is gathering 271/296 of the unique items. I'm guessing that some of the data is being flagged as duplicates when it shouldn't, any suggestions?
#Take content of each file (all names) and add unique values to text file
#for each unique value, create a row & check to see which txt files contain
function List {
$nofiles = Read-Host "How many files are we pulling from?"
$data = #()
for ($i = 0;$i -lt $nofiles; $i++)
{
$data += Read-Host "Give me the file name for file # $($i+1)"
}
return $data
}
function Aggregate ($array) {
Get-Content $array | Sort-Object -unique | Out-File newaggregate.txt
}
#SCRIPT BODY
$data = List
aggregate ($data)
I was expecting this code to catch everything, but it's missing some items that look very similar. List of missing names and their similar match:
CORPINZUTL16 MISSING FROM OUTFILE
CORPINZTRACE MISSING FROM OUTFILE
CORPINZADMIN Found In File
I have about 20 examples like this one. Apparently the Get-Content -Unique is not checking every character in a line. Can anyone recommend a better way of checking each line or possibly forcing the get-character to check full names?
Just for demonstration this line creates 3 txt files with numbers
for($i=1;$i -lt 4;$i++){set-content -path "$i.txt" -value ($i..$($i+7))}
1.txt | 2.txt | 3.txt | newaggregate.txt
1 | | | 1
2 | 2 | | 2
3 | 3 | 3 | 3
4 | 4 | 4 | 4
5 | 5 | 5 | 5
6 | 6 | 6 | 6
7 | 7 | 7 | 7
8 | 8 | 8 | 8
| 9 | 9 | 9
| | 10 | 10
Here using Get-Content with a range [1-3] of files
Get-Content [1-3].txt | Sort-Object {[int]$_} -Unique | Out-File newaggregate.txt
$All = Get-Content .\newaggregate.txt
foreach ($file in (Get-ChildItem [1-3].txt)){
Compare-Object $All (Get-Content $file.FullName) |
Select-Object #{n='File';e={$File}},
#{n="Missing";e={$_.InputObject}} -ExcludeProperty SideIndicator
}
File Missing
---- -------
Q:\Test\2019\05\07\1.txt 9
Q:\Test\2019\05\07\1.txt 10
Q:\Test\2019\05\07\2.txt 1
Q:\Test\2019\05\07\2.txt 10
Q:\Test\2019\05\07\3.txt 1
Q:\Test\2019\05\07\3.txt 2
there are two ways to achieve this one is using select-object -Unique which works when data is not sorted and can be used for small data or lists.
When dealing with large files we can use get-Unique command which works with sorted input, if input data is not sorted then it will give wrong results.
Get-ChildItem *.txt | Get-Content | measure -Line #225949
Get-ChildItem *.txt | Get-Content | sort | Get-Unique | measure -Line #119650
Here is my command for multiple files :
Get-ChildItem *.txt | Get-Content | sort | Get-Unique >> Unique.txt

Replace command using post.context to add new line

Trying to add a line break here but it's not working. Tried also using the `r, but no luck.
My code:
$extract = #()
Select-String -Path $outfile -Pattern "text" -Context 0,12 |
ForEach-Object {
$extract += $_.Line | foreach {$_.Replace("text", "DD/MM")}
$extract += $_.Context.PostContext | foreach {$_.Replace(',', "`n")}
}
$extract | Out-File $outfile1
My Input:
DD/MM
27,28
14
21
1
15
7
12
2,15
25
What I'm trying to do is to add a line break for 27,28 and 2,15 so it reads each number in a separate line:
27
28
2
15
Any idea why the replace is not working here?

ConvertFrom-CSV from stdin

In Powershell in a script I'd like to treat the whole stdin stream as a CSV file and process something for each line. How do I do that?
As an example:
PS > type a.csv
"a","b","c"
1,2,3
4,5,6
PS > type a.csv | ./takeTheFirst.ps1
1
4
I tried the following:
ConvertFrom-CSV $input | ForEach-Object {
$_.a
}
but there's no output. I am not sure about the "$input" variable.
You're almost there.
Pipe $input to ConvertFrom-Csv instead:
$input | ConvertFrom-CSV | ForEach-Object {
$_.a
}
PS> type .\a.csv | .\takeTheFirst.ps1
1
4

How do I get filename and line count per file using powershell

I have the following powershell script to count lines per file in a given directory:
dir -Include *.csv -Recurse | foreach{get-content $_ | measure-object -line}
This is giving me the following output:
Lines Words Characters Property
----- ----- ---------- --------
27
90
11
95
449
...
The counts-per-file is fine (I don't require words, characters, or property), but I don't know what filename the count is for.
The ideal output would be something like:
Filename Lines
-------- -----
Filename1.txt 27
Filename1.txt 90
Filename1.txt 11
Filename1.txt 95
Filename1.txt 449
...
How do I add the filename to the output?
try this:
dir -Include *.csv -Recurse |
% { $_ | select name, #{n="lines";e={
get-content $_ |
measure-object -line |
select -expa lines }
}
} | ft -AutoSize
I can offer another solution :
Get-ChildItem $testPath | % {
$_ | Select-Object -Property 'Name', #{
label = 'Lines'; expression = {
($_ | Get-Content).Length
}
}
}
I operate on the. TXT file, the return value is like this ↓
Name Lines
---- ----
1.txt 1
2.txt 2
3.txt 3
4.txt 4
5.txt 5
6.txt 6
7.txt 7
8.txt 8
9.txt 9
The reason why I want to sort like this is that I am rewriting a UNIX shell command (from The Pragmatic Programmer: Your Journey to Mastery on page 145).
The purpose of this command is to find out the five files with the largest number of lines.
At present, my progress is the above content,i'm close to success.
However, this command is far more complicated than the UNIX shell command!
I believe there should be a simpler way, I'm trying to find it.
find . -type f | xargs wc -l | sort -n | tail -5
I have used the following script that gives me lines in files of all sub directories in folder c:\temp\A. The output is in lines1.txt file. I have applied a filer to choose only file types of ".TXT".
Get-ChildItem c:\temp\A -recurse | where {$_.extension -eq ".txt"} | % {
$_ | Select-Object -Property 'Name', #{
label = 'Lines'; expression = {
($_ | Get-Content).Length
}
}
} | out-file C:\temp\lines1.txt

PowerShell: How to remove columns from delimited text input?

I have a text file with 5 columns of text delimited by whitespace. For example:
10 45 5 23 78
89 3 56 12 56
999 4 67 93 5
Using PowerShell, how do I remove the rightmost two columns? The resulting file should be:
10 45 5
89 3 56
999 4 67
I can extract the individual items using the -split operator. But, the items appear on different lines and I do not see how I can get them back as 3 items per line.
And to make the question more generic (and helpful to others): How to use PowerShell to remove the data at multiple columns in the range [0,n-1] given an input that has lines with delimited data of n columns each?
Read the file content, convert it to a csv and select just the first 3 columns:
Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ' | Select-Object col1,col2,col3
If you want just the values (without a header):
Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ' | Select-Object col1,col2,col3 | Format-Table -HideTableHeaders -AutoSize
To save back the results to the file:
(Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ') | Foreach-Object { "{0} {1} {2}" -f $_.col1,$_.col2,$_.col3} | Out-File .\file.txt
UPDATE:
Just another option:
(Get-Content .\file.txt) | Foreach-Object { $_.split()[0..2] -join ' ' } | Out-File .\file.txt
One way is:
gc input.txt | %{[string]::join(" ",$_.split()[0..2]) } | out-file output.txt
(replace 2 by n-1)
Here is the generic solution:
param
(
# Input data file
[string]$Path = 'data.txt',
# Columns to be removed, any order, dupes are allowed
[int[]]$Remove = (4, 3, 4, 3)
)
# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending
# read input lines
Get-Content $Path | .{process{
# split and add to ArrayList which allows to remove items
$list = [Collections.ArrayList]($_ -split '\s')
# remove data at the indexes (from tail to head due to descending order)
foreach($i in $Remove) {
$list.RemoveAt($i)
}
# join and output
$list -join ' '
}}