How to transpose one column (multiple rows) to multiple columns? - powershell

I have a file with the below contents:
0
ABC
1
181.12
2
05/07/16
3
1002
4
1211511108
6
1902
7
1902
10
hello
-1
0
ABC
1
1333.21
2
02/02/16
3
1294
4
1202514258
6
1294
7
1294
10
HAI
-1
...
I want to transpose the above file contents like below. The '-1' in above lists is the record separator which indicates the start of the next record.
ABC,181.12,05/07/16,1002,1211511108,1902,1902,hello
ABC,1333.21,02/02/16,1294,1202514258,1294,1294,HAI
...
Please let me know how to achieve this.

Read the file as a single string:
$txt = Get-Content 'C:\path\to\your.txt' | Out-String
Split the content at -1 lines:
$txt -split '(?m)^-1\r?\n'
Split each block at line breaks:
... | ForEach-Object {
$arr = $_ -split '\r?\n'
}
Select the values at odd indexes (skip the number lines) and join them by commas:
$indexes = 1..$($arr.Count - 1) | Where-Object { ($_ % 2) -ne 0 }
$arr[$indexes] -join ','

Related

Adding two broken rows using powershell

I have a file which has header at first row and then other data at remaining rows. I want to check whether all rows has equal no of data with header.
For example: if header has 10 count then I want all my remaining rows to have 10 data each so there will be no error while loading the data.
Suppose in line 5 and 6 there are only 5 data each.So,i want to combine these two rows in such case.
My expected output is(Row 5 has the merged data)
There may such breakable data in many rows of the file.So, i just want to scan whole file and will merge the two rows when such cases are seen.
So, I tried using:
$splitway=' '
$firstLine = Get-Content -Path $filepath -TotalCount 1
$firstrowheader=$firstLine.split($splitway,System.StringSplitOptions]::RemoveEmptyEntries)
$requireddataineachrow=$firstrowheader.Count
echo $requireddataineachrow
The above code will give me 10 since my header is having 10 data.
For ($i = 1; $i -lt $totalrows; $i++) {
$singleline=Get-Content $filepath| Select -Index $i
$singlelinesplit=$singleline.split($splitway,[System.StringSplitOptions]::RemoveEmptyEntries)
if($singlelinesplit.Count -lt $requireddataineachrow){
$curr=Get-Content $filepath| Select -Index $i
$next=Get-Content $filepath| Select -Index $i+1
Write-Host (-join($curr, " ", $next))
}
echo $singlelinesplit.Count
}
I tested using Write-Host (-join($curr, " ", $next)) to join two lines but it's not giving the correct output.
echo $singlelinesplit.Count is showing correct result:
My whole data is:
billing summary_id offer_id vendor_id import_v system_ha rand_dat mand_no sad_no cad_no
11 23 44 77 88 99 100 11 12 500
1111 2333 4444 6666 7777777 8888888888 8888888888888 9999999999 1111111111111 2000000000
33333 444444 As per new account ddddddd gggggggggggg wwwwwwwwwww bbbbbbbbbbb qqqqqqqqqq rrrrrrrrr 5555555
22 33 44 55 666<CR>
42 65 66 55 244
11 23 44 76 88 99 100 11 12 500
1111 2333 new document 664466 7777777 8888888888 8888888888888 9999999999 111111144111 200055000
My whole code if needed is:
cls
$filepath='D:\test.txt'
$splitway=' '
$totalrows=#(Get-Content $filepath).Length
write-host $totalrows.gettype()
$firstLine = Get-Content -Path $filepath -TotalCount 1
$firstrowheader=$firstLine.split($splitway,[System.StringSplitOptions]::RemoveEmptyEntries)
$requireddataineachrow=$firstrowheader.Count
For ($i = 1; $i -lt $totalrows; $i++) {
$singleline=Get-Content $filepath| Select -Index $i
$singlelinesplit=$singleline.split($splitway,[System.StringSplitOptions]::RemoveEmptyEntries)
if($singlelinesplit.Count -lt $requireddataineachrow){
$curr=Get-Content $filepath| Select -Index $i
$next=Get-Content $filepath| Select -Index $i+1
Write-Host (-join($curr, " ", $next))
}
echo $singlelinesplit.Count
}
Update: It seems that instances of string <CR> are a verbatim part of your input file, in which case the following solution should suffice:
(Get-Content -Raw sample.txt) -replace '<CR>\s*', ' ' | Set-Content sample.txt
Here's a solution that makes the following assumptions:
<CR> is just a placeholder to help visualize an actual newline in the input file.
Only data rows with fewer columns than the header row require fixing (as Mathias points out, your data is ambiguous, because a column value such as As per new account technically comprises three values, due to its embedded spaces).
Such a data row can blindly be joined with the subsequent line (only) to form a complete data row.
# Create a sample file.
#'
billing summary_id offer_id vendor_id import_v system_ha rand_dat mand_no sad_no cad_no
11 23 44 77 88 99 100 11 12 500
1111 2333 4444 6666 7777777 8888888888 8888888888888 9999999999 1111111111111 2000000000
33333 444444 As per new account ddddddd gggggggggggg wwwwwwwwwww bbbbbbbbbbb qqqqqqqqqq rrrrrrrrr 5555555
22 33 44 55 666
42 65 66 55 244
11 23 44 76 88 99 100 11 12 500
1111 2333 new document 664466 7777777 8888888888 8888888888888 9999999999 111111144111 200055000
'# > sample.txt
# Read the file into the header row and an array of data rows.
$headerRow, $dataRows = Get-Content sample.txt
# Determine the number of whitespace-separated columns.
$columnCount = (-split $headerRow).Count
# Process all data rows and save the results back to the input file:
# Whenever a data row with fewer columns is encountered,
# join it with the next row.
$headerRow | Set-Content sample.txt
$joinWithNext = $false
$dataRows |
ForEach-Object {
if ($joinWithNext) {
$partialRow + ' ' + $_
$joinWithNext = $false
}
elseif ((-split $_).Count -lt $columnCount) {
$partialRow = $_
$joinWithNext = $true
}
else {
$_
}
} | Add-Content sample.txt

PowerShell index[0] to the first instance of the string, index[1] to the second instance and so on till finished

For example, replace LINE2 1243 with LINE2 1 because it is on line 1 of test.txt.
# Find the line number:
$lines = sls "LINE2" test.txt | Select-Object -ExpandProperty LineNumber
test.txt:
abc LINE2 1243
lmn LINE2 1250
xyz LINE2 1255
Using:
gc test.txt | % { $_ -replace "LINE2.*", "LINE2 $lines" }
I get:
abc LINE2 1 2 3
lmn LINE2 1 2 3
xyz LINE2 1 2 3
How do I supply index[0], and only index[0], to the first instance of the string, index[1] to the second instance and so on till finished.
Doing it another way:
foreach ($line in $lines){
gc test.txt | % { $_ -replace "LINE2.*", "LINE2 $line" }
}
I get:
abc LINE2 1
lmn LINE2 1
xyz LINE2 1
abc LINE2 2
lmn LINE2 2
xyz LINE2 2
abc LINE2 3
lmn LINE2 3
xyz LINE2 3
How do I get index[0] to only the first instance of the string and so on.
You could use a for loop with an index to achieve this (If I got you right) ;-)
$lines = Select-String "LINE2" -Path C:\sample\test.txt | Select-Object -ExpandProperty LineNumber
Get-Content -Path C:\sample\test.txt -OutVariable Content
for ($index = 0; $index -lt $lines.count; $index++) {
$Content[$index] -replace "LINE2.*", "LINE2 $($lines[$index])"
}
Output:
abc LINE2 1
lmn LINE2 2
xyz LINE2 3
this is a somewhat different way to do things. [grin] what is does ...
reads in the file
i faked this with a here-string, but use Get-Content when doing this for real.
gets the matching lines
it uses the way that -match works against a collection to pull the lines that match the target.
splits on the spaces
selects the 1st two items from that array
adds a $Counter to the collection
joins the three items with a space delimiter
sends the resulting line to the $Results collection
shows that collection on screen
saves it to a text file
here's the code ...
# fake reading in a text file
# in real life, use Get-Content
$InStuff = #'
cba line1 1234
abc LINE2 1243
mnl line4 1244
lmn LINE2 1250
zyx line9 1251
xyz LINE2 1255
qwe line9 1266
'# -split [environment]::NewLine
$Target = 'Line2'
$Counter = 1
$Results = foreach ($IS_Item in ($InStuff -match $Target))
{
$IS_Item.Split(' ')[0..-1] + $Counter -join ' '
$Counter ++
}
# on screen
$Results
# to a file
$Results |
Set-Content -LiteralPath "$env:TEMP\somebadhat.txt"
on screen ...
abc 1243 1
lmn 1250 2
xyz 1255 3
in the text file ...
abc 1243 1
lmn 1250 2
xyz 1255 3

Count line lengths in file using powershell

If I have a long file with lots of lines of varying lengths, how can I count the occurrences of each line length?
Example:
this
is
a
sample
file
with
several
lines
of
varying
length
Output:
Length Occurences
1 1
2 2
4 3
5 1
6 2
7 2
Have you got any ideas?
For high-volume work, use Get-Content with -ReadCount
$ht = #{}
Get-Content <file> -ReadCount 1000 |
foreach {
foreach ($line in $_)
{$ht[$line.length]++}
}
$ht.GetEnumerator() | sort Name
How about
get-content <file> | Group-Object -Property Length | sort -Property Name
Depending on how long your file is, you may want to do something more efficient

Delete duplicate string with PowerShell

I have got text file:
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
1 2 4 5 6 7
Here first and last line are simmilar. I have a lot of files that have double lines. I need to delete all dublicate.
All these seem really complicated. It is as simple as:
gc $filename | sort | get-unique > $output
Using actual file names instead of variables:
gc test.txt| sort | get-unique > unique.txt
To get unique lines:
PS > Get-Content test.txt | Select-Object -Unique
1 2 4 5 6 7
1 3 5 6 7 8
1 2 3 4 5 6
To remove the duplicate
PS > Get-Content test.txt | group -noelement | `
where {$_.count -eq 1} | select -expand name
1 3 5 6 7 8
1 2 3 4 5 6
If order is not important:
Get-Content test.txt | Sort-Object -Unique | Set-Content test-1.txt
If order is important:
$set = #{}
Get-Content test.txt | %{
if (!$set.Contains($_)) {
$set.Add($_, $null)
$_
}
} | Set-Content test-2.txt
Try something like this:
$a = #{} # declare an arraylist type
gc .\mytextfile.txt | % { if (!$a.Contains($_)) { $a.add($_)}} | out-null
$a #now contains no duplicate lines
To set the content of $a to mytextfile.txt:
$a | out-file .\mytextfile.txt
$file = "C:\temp\filename.txt"
(gc $file | Group-Object | %{$_.group | select -First 1}) | Set-Content $file
The source file now contains only unique lines
The already posted options did not work for me for some reason

PowerShell: How to remove columns from delimited text input?

I have a text file with 5 columns of text delimited by whitespace. For example:
10 45 5 23 78
89 3 56 12 56
999 4 67 93 5
Using PowerShell, how do I remove the rightmost two columns? The resulting file should be:
10 45 5
89 3 56
999 4 67
I can extract the individual items using the -split operator. But, the items appear on different lines and I do not see how I can get them back as 3 items per line.
And to make the question more generic (and helpful to others): How to use PowerShell to remove the data at multiple columns in the range [0,n-1] given an input that has lines with delimited data of n columns each?
Read the file content, convert it to a csv and select just the first 3 columns:
Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ' | Select-Object col1,col2,col3
If you want just the values (without a header):
Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ' | Select-Object col1,col2,col3 | Format-Table -HideTableHeaders -AutoSize
To save back the results to the file:
(Import-Csv .\file.txt -Header col1,col2,col3,col4,col5 -Delimiter ' ') | Foreach-Object { "{0} {1} {2}" -f $_.col1,$_.col2,$_.col3} | Out-File .\file.txt
UPDATE:
Just another option:
(Get-Content .\file.txt) | Foreach-Object { $_.split()[0..2] -join ' ' } | Out-File .\file.txt
One way is:
gc input.txt | %{[string]::join(" ",$_.split()[0..2]) } | out-file output.txt
(replace 2 by n-1)
Here is the generic solution:
param
(
# Input data file
[string]$Path = 'data.txt',
# Columns to be removed, any order, dupes are allowed
[int[]]$Remove = (4, 3, 4, 3)
)
# sort indexes descending and remove dupes
$Remove = $Remove | Sort-Object -Unique -Descending
# read input lines
Get-Content $Path | .{process{
# split and add to ArrayList which allows to remove items
$list = [Collections.ArrayList]($_ -split '\s')
# remove data at the indexes (from tail to head due to descending order)
foreach($i in $Remove) {
$list.RemoveAt($i)
}
# join and output
$list -join ' '
}}