Convert 2 columns file into 1 column - powershell

I have a text file with 2 columns and any number of lines (100 or more):
data1 data2
data3 data4
I want to transform it to 1 column like this:
data1
data2
data3
data4
In unix I could have done it with for loop and awk, but getting confused being very new to PowerShell.

# Read lines, loop each line with the variable name $_
Get-Content c:\wherever\input.txt | ForEach-Object {
-split $_ # unary split breaks on whitespace
# pieces go down the pipeline
} | Set-Content c:\wherever\output.txt -Encoding UTF8 # save them to a file
or in the shell, for brevity:
-split(gc 1.txt)|sc 2.txt -En utf8

#Solution 1, get data with delimiter and remove blanck line and carriage return
get-content "C:\temp\test\test1.txt" -delimiter " " | where {$_ -ne " "} | foreach {$_ -replace "`n", ""}
#Solution 2, import-csv with delimiter and print 2 columns C1 and C2
import-csv "C:\temp\test\test1.txt" -Delimiter " " -Header C1, C2 | foreach {$_.C1;$_.C2}
#Solution 3, variante of solution 2
get-content "C:\temp\test\test1.txt" | ConvertFrom-Csv -Delimiter " " -Header C1, C2 | %{$_.C1;$_.C2}
#Solution 4, variante of solution 3 but with convertfrom-string (autocomun P1 and P2 are builded)
get-content "C:\temp\test\test1.txt" | ConvertFrom-String -Delimiter " " | %{$_.P1;$_.P2}
#Solution 5 with split every row (proposed by TessellatingHeckler )
get-content "C:\temp\test\test1.txt" | foreach {-split $_ }

$textfile = "path/sourcefile" #path and name of the source text file
$newfile = 'newfile.txt' #name of the new text file
New-Item -ItemType file -Name $newfile #a new text file will be created from wherever you are running this script from if you have the script saved and named, else it will be created in the c drive.
$textfile = Get-Content $textfile #source text file being red and stored into a variable
foreach ($line in $textfile) {
$datas = $line.split(' ') | where { $_.length -gt 1 } #each line in the text file is being split and spaces being excluded
foreach ($data in $datas)
{ Add-Content $newfile $data } #columns being merged, this will work with multiple columns which are separated by a space

Related

Choosing columns for merged files in PowerShell

I am using get-childitem to recurse through directories ( skipping some at the top level ) open a series of csv files, append the filename to the end each line of data
and combine the data into one.
$mergedData= Get-ChildItem $path -Exclude yesterday,"OHCC Extract",output |
Get-ChildItem -recurse -Filter *csv |
Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } |
% {
$file = $_.Name
$fn = $_.FullName
## capture the header line
$FirstLine = Get-Content $fn -TotalCount 1
## add the column header for filename
$header = $FirstLine + ",Filename"
## get the contents of the files without the first line
Get-Content $fn | SELECT -Skip 1 | %{ "$_,$file" }
}
Now each file had 5 columns , ID, First Name , Last Name , Phone , Address. The column names are surrounded by double quotes ( "ID", "First Name" ) .
The request is now to skip everything but the ID and the Last Name column. So I tried ( starting with just ID, will add First Name later)
Get-Content $fn | SELECT -Skip 1 -Property ID | %{ "$_,$file" }
I get #{ID=} in the resulting file.
Then I tried
Get-Content $fn | SELECT -Skip 1 | %{ $_.ID }
which yield blanks and then
Import-Csv -Path $fn -Delimiter ',' | SELECT ID
Which gives #{ID=73aec2fe-6cb3-492e-a157-25e355ed9691}
At this point I am just flailing because I obviously don't know how to handle objects in PS.
I have PowerShell 5.1.19041.1682 on windows 10.
Thanks
I was asked for sample data , so here it is. There are 35 files across multiple subdirectories
Input FileA
column1
ID
Column3
East
12
apple
west
5
pear
Input FileB
column1
ID
Column3
East
15
kiwi
Output
Column1
column3
Filename
East
kiwi
FileB
East
apple
FileA
west
pear
FileB
But I did figure it out myself . Working code
$path = "<directory where the files are located> "
$pathout = "<path to outputted file>"
$out = "$pathout\csv_merged_$(get-date -f MMddyyyy).csv"
$mergedData= Get-ChildItem $path -Exclude yesterday,output | Get-ChildItem -recurse -Filter *csv | Where-Object { $_.CreationTime -gt (Get-Date).AddDays(-1) } | % {
$file = $_.Name
$fn = $_.FullName
write-host $fn , $_.CreationTime
## get the contents of the files ,exclude columns and add columns
$Data = Import-Csv -Path $fn -Delimiter ',' | SELECT *, #{Name = 'Filename'; Expression = {$file}} -ExcludeProperty ID
# get the headers
$header= $Data | ConvertTo-Csv -NoTypeInformation | Select-Object -First 1
write-host $header
## convert the object and remove the column headers for each file
$Data | ConvertTo-Csv -NoTypeInformation | Select-Object -Skip 1
write-host
write-host '-----------------------'
}
# Prefix the header before the compiled data
$header, $mergedData | Set-Content -Encoding utf8 $out
The missing piece was the ConvertTo_Csv which expanded the object.

PowerShell remove last column of pipe delimited text file

I have a folder of pipe delimited text files that I need to remove the last column on. I'm not seasoned in PS but I found enough through searches to help. I have two pieces of code. The first creates new text files in my destination path, keeps the pipe delimiter, but doesn't remove the last column. There are 11 columns. Here is that script:
$OutputFolder = "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Load_To_IMS"
ForEach ($File in (Get-ChildItem "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Stage_To_IMS\*.txt"))
{
(Get-Content $File) | Foreach-Object { $_.split()[0..9] -join '|' } | Out-File $OutputFolder\$($File.Name)
}
Then this second code I tried creates the new text files on my destination path, it DOES get rid of the last column, but it loses the pipe delimiter. Ugh.
$OutputFolder = "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Load_To_IMS"
ForEach ($File in (Get-ChildItem "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Stage_To_IMS\*.txt"))
{
Import-Csv $File -Header col1,col2,col3,col4,col5,col6,col7,col8,col9,col10,col11 -Delimiter '|' |
Foreach-Object {"{0} {1} {2} {3} {4} {5} {6} {7} {8} {9}" -f $_.col1,$_.col2,$_.col3,$_.col4,$_.col5,$_.col6,$_.col7,$_.col8,$_.col9,$_.col10} | Out-File $destination\$($File.Name)
}
I have no clue on what I'm doing wrong. I have no preference in which way I get this done but I need to keep the delimiter and the have the last column removed. Any help would be greatly appreciated.
In your plain-text processing attempt with Get-Content, you simply need to split each line by | first (.Split('|')), before extracting the fields of interest with a range operation (..) and joining them back with |:
Get-Content $File |
Foreach-Object { $_.Split('|')[0..9] -join '|' } |
Out-File $OutputFolder\$($File.Name)
In your Import-Csv-based attempt, you can take advantage of the fact that it will only read as many columns as you supply column names for, via -Header:
# Pass only 10 column names to -Header
Import-Csv $File -Header (0..9).ForEach({ 'col' + $_ }) -Delimiter '|' |
ConvertTo-Csv -Delimiter '|' | # convert back to CSV with delimiter '|'
Select-Object -Skip 1 | # skip the header row
Out-File $destination\$($File.Name)
Note that ConvertTo-Csv, just like Export-Csv by default double-quotes each field in the resulting CSV data / file.
In Windows PowerShell, you cannot avoid this, but in PowerShell (Core) 7+ you can control this behavior with -UseQuotes Never, for instance.
You can give this a try, should be more efficient than using Import-Csv, however note, this should always exclude the last column of your files no matter how many columns they have and assuming they're pipe delimited:
$OutputFolder = "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Load_To_IMS"
foreach ($File in (Get-ChildItem "D:\DC_Costing\Vendor Domain\CostUpdate_Development_Stage_To_IMS\*.txt")) {
[IO.File]::ReadAllLines($File.FullName) | & {
process{
-join ($_ -split '(?=\|)' | Select-Object -SkipLast 1)
}
} | Set-Content (Join-Path $OutputFolder -ChildPath $File.Name)
}

Join columns from different txt files - powershell

I need to extract columns from one file and join them in another file.
I used this code to select the columns that I need:
$original_path = 'C:\Users\leticia.araujo\Downloads\Arquivo Buffer\Arquivo teste'
$files = Get-ChildItem $original_path
ForEach($file in $files) {
$pathFile = $original_path + '\' + $file.Name
$SegundaColuna = Get-Content -Path $pathFile | Foreach {"$(($_ -split ',')[3..3])"}
$TerceiraColuna = Get-Content -Path $pathFile | Foreach {"$(($_ -split ':')[3..3])"}
$QuartaColuna = Get-Content -Path $pathFile | Foreach {"$(($_ -split ',')[10..10])"}
}
When I try to put these in a txt using
'Add-Content $pathFile $SegundaColuna,$TerceiraColuna,$QuartaColuna'
I got, but in the file the columns are not next to each other. they are under each other.
Example:
I need they are like this:
1 a
2 b
3 c
But they are like this:
1
2
3
a
b
c
Focusing on a single file inside your foreach loop:
Since the values to join come from the same lines of a given file, read that file line by line:
Get-Content -Path $pathFile | # Read the file line by line.
ForEach-Object { # Process each line.
($_ -split ',')[3],
($_ -split ':')[3],
($_ -split ',')[10] -join ' ' # Output the column values joined with a space.
} |
Set-Content out.txt
If you need to merge columns across all your input files and create a single output file, replace the foreach loop with a single pipeline:
Get-ChildItem $original_path |
Get-Content |
ForEach-Object {
($_ -split ',')[3],
($_ -split ':')[3],
($_ -split ',')[10] -join ' '
} |
Set-Content out.txt

Powershell text procesing: Join specific lines of a txt file

I have to process some text and got some difficulties:
The text .\text.txt is formatted like that:
name,
surname,
address,
name.
surname,
address,
etc.
What I want to achieve is join the objects that ends with the "," like this:
name,surname,address
name,surname,address
etc
I was working on something like this:
$content= path to the text.txt
$result= path to the result file
Get-Content -Encoding UTF8 $content | ForEach-object {
if ( $_ -match "," ) {
....join the selected lines....
}
} |Set-Content -Encoding UTF8 $result
What I need to consider is also that lines which terminate with "," may have a next line empty which should be a CR in the $result
You can do this by splitting the blocks of data on the empty newlines first:
# read the content of the file as one single multiline string
$content = Get-Content -Path 'Path\To\The\file.txt' -Raw -Encoding UTF8
# split on two or more newlines and dispose of empty blocks
$content -split '(\r?\n){2,}' | Where-Object { $_ -match '\S' } | ForEach-Object {
# trim the text block, split on newline and remove the trailing commas (or dots)
# output these joined with a comma
($_.Trim() -split '\r?\n' ).TrimEnd(",.") -join ','
} | Set-Content -Path 'Path\To\The\NEW_file.txt' -Encoding UTF8
Output:
name,surname,address
name,surname,address
all your terms ends with a , so you could use regex:
$content= "C:\test.txt"
$result= "path to the result file"
$CR = "`r`n"
$lines = Get-Content -Encoding UTF8 $content -raw
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
$lines = [regex]::new(',(?:\r?\n){2,}', $option).Replace($lines, $CR + $CR)
$lines = [regex]::new(',\r?\n', $option).Replace($lines, ",")
$lines | Out-File -FilePath $result -Encoding utf8
result:
name,surname,address
name1,surname,address
name,surname,address
name,surname,address
Below piece of code will give the required result.
$content= "Your file path"
$resultPath = "result file path"
Get-Content $content | foreach {
$data = $_
if($data -eq "address,")
{
$NewData = $data -replace ',',''
$data = $NewData + "`r`n"
}
$out = $out + $data
}
$out | Out-File $resultPath

Powershell: Read Text file line by line and split on "|"

I am having trouble splitting a line into an array using the "|" in a text file and reassembling it in a certain order. There are multiple lines like the original line in the text file.
This is the original line:
80055555|Lastname|Firstname|AidYear|DCDOCS|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf
I need it to look this way:
80055555|DCDOCS|Lastname|Firstname|AidYear|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf
Here is the code I am working with:
$File = 'c:\Names\Complete\complete.txt'
$Arr = $File -split '|'
foreach ($line in Get-Content $File)
{
$outputline = $Arr[0] + "|" + $Arr[4] + "|" + $Arr[1] + "|" + $Arr[2] + "|" +
"##" + $Arr[5] |
Out-File -filepath "C:\Names\Complete\index.txt" -Encoding "ascii" -append
}
You need to process every line of the file on its own and then split them.
$File = get-content "D:\test\1234.txt"
foreach ($line in $File){
$Arr = $line.Split('|')
[array]$OutputFile += $Arr[0] + "|" + $Arr[4] + "|" + $Arr[1] + "|" + $Arr[2] + "|" + "##" + $Arr[5]
}
$OutputFile | out-file -filepath "D:\test\4321.txt" -Encoding "ascii" -append
edit: Thx to LotPings for this alternate suggestion based on -join and the avoidance of += to build the array (which is inefficient, because it rebuilds the array on every iteration):
$File = get-content "D:\test\1234.txt"
$OutputFile = foreach($line in $File){($line.split('|'))[0,4,1,2,3,5] -Join '|'}
$OutputFile | out-file -filepath "D:\test\4321.txt" -Encoding "ascii"
To offer a more PowerShell-idiomatic solution:
# Sample input line.
$line = '80055555|Lastname|Firstname|AidYear|DCDOCS|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf'
# Split by '|', rearrange, then re-join with '|'
($line -split '\|')[0,4,1,2,3,5] -join '|'
Note how PowerShell's indexing syntax (inside [...]) is flexible enough to accept an arbitrary array (list) of indices to extract.
Also note how -split's RHS operand is \|, i.e., an escaped | char., given that | has special meaning there, because it is interpreted as a regex.
To put it all together:
$File = 'c:\Names\Complete\complete.txt'
Get-Content $File | ForEach-Object {
($_ -split '\|')[0,4,1,2,3,5] -join '|'
} | Out-File -LiteralPath C:\Names\Complete\index.txt -Encoding ascii
As for what you tried:
$Arr = $File -split '|'
Primarily, the problem is that the -split operation is applied to the input file path, not to the file's content.
Secondarily, as noted above, to split by a literal | char., \| must be passed to -split, because it expects a regex (regular expression).
Also, instead of using Out-File inside a loop with -Append, it is more efficient to use a single pipeline with ForEach-Object, as shown above.
Since your input file is actually a CSV file without headers and where the fields are separated by the pipe symbol |, why not use Import-Csv like this:
$fileIn = 'C:\Names\Complete\complete.txt'
$fileOut = 'C:\Names\Complete\index.txt'
(Import-Csv -Path $File -Delimiter '|' -Header 'Item','LastName','FirstName','AidYear','Type','FileName' |
ForEach-Object {
"{0}|{1}|{2}|{3}|{4}|{5}" -f $_.Item, $_.Type, $_.LastName, $_.FirstName, $_.AidYear, $_.FileName
}
) | Add-Content -Path $fileOut -Encoding Ascii