Powershell: Read Text file line by line and split on "|"

Powershell: Read Text file line by line and split on "|" - powershell

I am having trouble splitting a line into an array using the "|" in a text file and reassembling it in a certain order. There are multiple lines like the original line in the text file.
This is the original line:
80055555|Lastname|Firstname|AidYear|DCDOCS|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf
I need it to look this way:
80055555|DCDOCS|Lastname|Firstname|AidYear|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf
Here is the code I am working with:
$File = 'c:\Names\Complete\complete.txt'
$Arr = $File -split '|'
foreach ($line in Get-Content $File)
{
$outputline = $Arr[0] + "|" + $Arr[4] + "|" + $Arr[1] + "|" + $Arr[2] + "|" +
"##" + $Arr[5] |
Out-File -filepath "C:\Names\Complete\index.txt" -Encoding "ascii" -append
}

You need to process every line of the file on its own and then split them.
$File = get-content "D:\test\1234.txt"
foreach ($line in $File){
$Arr = $line.Split('|')
[array]$OutputFile += $Arr[0] + "|" + $Arr[4] + "|" + $Arr[1] + "|" + $Arr[2] + "|" + "##" + $Arr[5]
}
$OutputFile | out-file -filepath "D:\test\4321.txt" -Encoding "ascii" -append
edit: Thx to LotPings for this alternate suggestion based on -join and the avoidance of += to build the array (which is inefficient, because it rebuilds the array on every iteration):
$File = get-content "D:\test\1234.txt"
$OutputFile = foreach($line in $File){($line.split('|'))[0,4,1,2,3,5] -Join '|'}
$OutputFile | out-file -filepath "D:\test\4321.txt" -Encoding "ascii"

To offer a more PowerShell-idiomatic solution:
# Sample input line.
$line = '80055555|Lastname|Firstname|AidYear|DCDOCS|D:\BDMS_UPLOAD\800123456_11-13-2018 14-35-53 PM_1.pdf'
# Split by '|', rearrange, then re-join with '|'
($line -split '\|')[0,4,1,2,3,5] -join '|'
Note how PowerShell's indexing syntax (inside [...]) is flexible enough to accept an arbitrary array (list) of indices to extract.
Also note how -split's RHS operand is \|, i.e., an escaped | char., given that | has special meaning there, because it is interpreted as a regex.
To put it all together:
$File = 'c:\Names\Complete\complete.txt'
Get-Content $File | ForEach-Object {
($_ -split '\|')[0,4,1,2,3,5] -join '|'
} | Out-File -LiteralPath C:\Names\Complete\index.txt -Encoding ascii
As for what you tried:
$Arr = $File -split '|'
Primarily, the problem is that the -split operation is applied to the input file path, not to the file's content.
Secondarily, as noted above, to split by a literal | char., \| must be passed to -split, because it expects a regex (regular expression).
Also, instead of using Out-File inside a loop with -Append, it is more efficient to use a single pipeline with ForEach-Object, as shown above.

Since your input file is actually a CSV file without headers and where the fields are separated by the pipe symbol |, why not use Import-Csv like this:
$fileIn = 'C:\Names\Complete\complete.txt'
$fileOut = 'C:\Names\Complete\index.txt'
(Import-Csv -Path $File -Delimiter '|' -Header 'Item','LastName','FirstName','AidYear','Type','FileName' |
ForEach-Object {
"{0}|{1}|{2}|{3}|{4}|{5}" -f $_.Item, $_.Type, $_.LastName, $_.FirstName, $_.AidYear, $_.FileName
}
) | Add-Content -Path $fileOut -Encoding Ascii

Related

Powershell text procesing: Join specific lines of a txt file

I have to process some text and got some difficulties:
The text .\text.txt is formatted like that:
name,
surname,
address,
name.
surname,
address,
etc.
What I want to achieve is join the objects that ends with the "," like this:
name,surname,address
name,surname,address
etc
I was working on something like this:
$content= path to the text.txt
$result= path to the result file
Get-Content -Encoding UTF8 $content | ForEach-object {
if ( $_ -match "," ) {
....join the selected lines....
}
} |Set-Content -Encoding UTF8 $result
What I need to consider is also that lines which terminate with "," may have a next line empty which should be a CR in the $result

You can do this by splitting the blocks of data on the empty newlines first:
# read the content of the file as one single multiline string
$content = Get-Content -Path 'Path\To\The\file.txt' -Raw -Encoding UTF8
# split on two or more newlines and dispose of empty blocks
$content -split '(\r?\n){2,}' | Where-Object { $_ -match '\S' } | ForEach-Object {
# trim the text block, split on newline and remove the trailing commas (or dots)
# output these joined with a comma
($_.Trim() -split '\r?\n' ).TrimEnd(",.") -join ','
} | Set-Content -Path 'Path\To\The\NEW_file.txt' -Encoding UTF8
Output:
name,surname,address
name,surname,address

all your terms ends with a , so you could use regex:
$content= "C:\test.txt"
$result= "path to the result file"
$CR = "`r`n"
$lines = Get-Content -Encoding UTF8 $content -raw
$option = [System.Text.RegularExpressions.RegexOptions]::Singleline
$lines = [regex]::new(',(?:\r?\n){2,}', $option).Replace($lines, $CR + $CR)
$lines = [regex]::new(',\r?\n', $option).Replace($lines, ",")
$lines | Out-File -FilePath $result -Encoding utf8
result:
name,surname,address
name1,surname,address
name,surname,address
name,surname,address

Below piece of code will give the required result.
$content= "Your file path"
$resultPath = "result file path"
Get-Content $content | foreach {
$data = $_
if($data -eq "address,")
{
$NewData = $data -replace ',',''
$data = $NewData + "`r`n"
}
$out = $out + $data
}
$out | Out-File $resultPath

Powershell 5: ConvertTo-Csv a CSV with quotes in some but not all columns

I am building am updating a script which imports a large CSV file and then splits it into lots of separate CSV files based on the value in the first two columns
so POIMP_NL_20210306.csv which contains:
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
P-100-6666|TED|5059585896992|1|"ACTION TOYS"
must be split into POIMP_P-100-1234_JANE.csv containing
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
and POIMP_P-100-6666_TED.csv
P-100-6666|TED|5059585896992|1|"ACTION TOYS"
The problem I am trying to solve is preserving the quotes in just the SUPPLIER column
Since ConvertTo-Csv adds quotes to everything, I use a % { $_ -replace '"', ""} to remove these all before the out-file is created but of course it removes these from the SUPPLIER column 2
Here is my script which perfectly splits the big file into smaller files by DOC_NUMBER and COMMENTS but removes all quotes:
$basePath = "C:\"
$archivePath = "$basePath\archive\"
$todaysDate = $(get-date -Format yyyyMMdd)
$todaysFiles = #(
(Get-ChildItem -Path $basePath | Where-Object { $_.Name -match 'POIMP_' + $todaysDate })
)
cd $basePath
foreach ($file in $todaysFiles ) {
$fileName = $file.ToString()
Import-Csv $fileName -delimiter "|" | Group-Object -Property "DOC_NUMBER","COMMENTS" |
Foreach-Object {
$newName = $_.Name -replace ",","_" -replace " ",""; $path=$fileName.SubString(0,8) + $newName+".csv" ; $_.group |
ConvertTo-Csv -NoTypeInformation -delimiter "|" | % { $_ -replace '"', ""} | out-file $path -fo -en ascii
}
Rename-Item $fileName -NewName ([io.path]::GetFileNameWithoutExtension("$fileName") + "_Original.csv")
Move-Item (Get-ChildItem -Path $basePath | Where-Object { $_.Name -match '_Original' }) $archivePath -force
}
And here is another script which I found online and amended and which successfully leaves quotes in just the SUPPLIER column by first adding double back ticks and then replacing these with quotes after all others have been removed
$ImportedCSV = Import-CSV "C:\POIMP_NL_20210306.csv" -delimiter "|"
$NewCSV = Foreach ($Entry in $ImportedCsv) {
$Entry.SUPPLIER = '¬¬' + $Entry.SUPPLIER + '¬¬'
$Entry
}
$NewCSV |
ConvertTo-Csv -NoTypeInformation -delimiter "|" | % { $_ -replace '"', ""} | % { $_ -replace '¬¬', '"'} | out-file "C:\updatedPO.csv" -fo -en ascii
I just can't merge these scripts to achieve the desired result as I can't seem to reference the correct object. I'd really appreciate your help! Thanks

Any good CSV reader should be able to handle quotes around csv fields, even when not really needed.
Having said that, It is your explicit wish to only have quotes around the field in the SUPPLIER column. (Note, in your example there is a trailing space after that column name)
In this case, I think this would help.
Not only does it surround the SUPPLIER fields with quotes, but also saves the data as separate files using the values from column DOC_NUMBER and COMMENTS per group found in the csv
$path = 'D:\Test'
$fileIn = Join-Path -Path $path -ChildPath 'POIMP_NL_20210306.csv'
# import the csv file and group first two columns
Import-Csv -Path $fileIn -Delimiter '|' | Group-Object -Property "DOC_NUMBER","COMMENTS" | ForEach-Object {
$headerDone = $false
$data = foreach ($item in $_.Group) {
if (!$headerDone) {
$item.PsObject.Properties.Name -join '|'
$headerDone = $true
}
$item.SUPPLIER = '"{0}"' -f $item.SUPPLIER
$item.PsObject.Properties.Value -join '|'
}
# create a new filename like 'POIMP_P-100-1234_JANE.csv'
$fileOut = Join-Path -Path $path -ChildPath ('POIMP_{0}_{1}.csv' -f $_.Group[0].DOC_NUMBER, $_.Group[0].COMMENTS)
# save the data not using Export-Csv because that will add quotes around everything (in PowerShell 5)
$data | Set-Content -Path $fileOut -Force
}
Output
POIMP_P-100-1234_JANE.csv
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-1234|JANE|5059585896978|2|"JOES SUPPLIES"
P-100-1234|JANE|5059585896985|2|"JOES SUPPLIES"
POIMP_P-100-6666_TED.csv
DOC_NUMBER|COMMENTS|ITEM|QTY|SUPPLIER
P-100-6666|TED|5059585896992|1|"ACTION TOYS"

If you are Powershell 7 or later, you can use
$yourdata | ConvertTo-Csv -NoTypeInformation -QuoteFields "SUPPLIER" -Delimiter "|" |
Out-File ...
or you could use
$yourdata | Export-Csv -NoTypeInformation -QuoteFields "SUPPLIER" `
-Delimiter "|" -Path <path-to-output-file>.csv
You can also use -UseQuotes AsNeeded to let the converter add quoting where it thinks it makes sense, otherwise just specify the fields you want quoted.

How to determine a file is tab delimited in PowerShell?

I have a script that I am working on that reads in some text files and converts them to .csv and changes some values. I have two different file sources. One is a tab delimited .txt file and the other is a comma separated .txt file. Is there a way to determine which type of delimiter is being used to determine which export function is appropriate?
get-childitem $workingDir -filter *.txt -Recurse| ForEach-Object {
$targetfile = $_.Name
$targetFile = $_.FullName.Substring(0,$_.FullName.Length-4)
$targetFile = $targetfile += ".csv"
if( Get-Content -Delimiter = `t ){
Write-Host "The file is tab-delimited"
Get-Content -path $_.FullName
ForEach-Object {$_ -replace “`t”,”,” } |
Out-File -filepath $targetFile -Encoding utf8
}
else {
Write-Host "The file is comma-separated"
Get-Content -path $_.FullName |
Out-File -filepath $targetFile -Encoding utf8
}
}

Another approach would be to use Select-String to check for tab character and set delimiter.
if(Get-Content $csvfile -First 1 | Select-String -Pattern "`t")
{
$delim = "`t"
}
else
{
$delim = ','
}
Import-Csv $csvfile -Delimiter $delim

Assuming that the comma-separated files never contain tabs (which would then be data), the most efficient approach is to inspect only the first line of each file for the presence of tab characters, which is most easily done with (Get-Content -First 1 $_.FullName) -match "`t" - see Get-Content and -match, the regular-expression matching operator.
# Determine the arguments to pass to Set-Content - later, via splatting -
# for writing the output file.
$setContentArgs = #{
LiteralPath = $_.BaseName + '.csv'
Encoding = 'utf8'
}
# Check the 1st line for containing a tab.
# (This assumes that the comma-separated files contain not tabs as data.)
if ((Get-Content -First 1 $_.FullName) -match "`t") {
Write-Host "The file is tab-delimited."
# Read line by line, replace tabs with commas, and write with UTF-8 encoding.
Get-Content $_.FullName | ForEach-Object { $_ -replace "`t", ',' } |
Set-Content #setContentArgs
}
else {
Write-Host "The file is comma-separated."
# Just read lines as-is and write with UTF-8 encoding.
Get-Content $_.FullName |
Set-Content #setContentArgs
}
Note the use of the .BaseName property on the input [System.IO.FileInfo], which conveniently reports the file name without its extension, which allows you to simply append the new extension.
Since you're dealing with text (strings) only, Set-Content, which is slightly more efficient, is preferable to Out-File.
For the technique of passing arguments via a hashtable (#{ ... }), see about_Splatting
If the files are smallish (easily fit into memory as a whole (possibly twice) each), you can significantly speed up processing by reading each file as a whole with -Raw and using
-NoNewLine (PSv5+) to write that (possibly modified) string as-is, without appending a trailing newline, to the output file.
Since you're then reading the entire file anyway, you can get away with a single Get-Content call and apply -replace "`t", ',' blindly, given that for comma-separated files this will simply be a (fast) no op.
(Get-Content -Raw $_.FullName) -replace "`t", ',' |
Set-Content ($_.BaseName + '.csv') -Encoding Utf8 -NoNewLine

I will use Import-Csv for this:
If(Import-Csv "File path to test if Tab-delimited file" -Delimiter "`t" -Ea SilentlyContinue){
"File is tab-delimited"
}
If(Import-Csv "File path to test if Comma-CSV file" -Ea SilentlyContinue){
"File is a comma-separated CSV"
}

powershell replace $_ if criteria meets

I read a xml file and want to replace three strings.
My Code:
foreach ($file in Get-ChildItem $files){
(Get-Content $file) | Foreach-Object {
$_ -replace 'TABLE_NAME="\$ZoneProdukt\$', ('TABLE_NAME="' + $ZONPRODLANG)`
-replace 'OPTION="Copy" ', '' `
-replace ('<JOB ','<JOB TIMETO="TEST" ') | ? {$_ -notlike "*TIMETO=`""}
} | Set-Content ($destination_folder + $file.name)
}
the last replace provides only the half of the result I
expect.
If there are lines containing "JOB" and "TIMETO" they will not be displayed (because of Where-Object)
How to keep lines if the mentioned "TIMETO"-Attribute already exists?
examples:
source line in file (without "TIMETO"):
<JOB JOBISN="30" USER="testuser">
correct replace:
<JOB TIMETO="TEST" JOB JOBISN="30" USER="testuser">
....
....
source line in file (with "TIMETO"):
<JOB JOBISN="30" USER="testuser" TIMETO="0400">
replace -> this line will not be displayed !!
..
thanks in advance! br danijel

You could use an if-statement in your ForEach-Object:
foreach ($file in Get-ChildItem $files){
(Get-Content $file) | Foreach-Object {
if($_ -like "*TIMETO=`""){
$_ -replace 'TABLE_NAME="\$ZoneProdukt\$', ('TABLE_NAME="' + $ZONPRODLANG)`
-replace 'OPTION="Copy" ', '' `
}else{
$_ -replace 'TABLE_NAME="\$ZoneProdukt\$', ('TABLE_NAME="' + $ZONPRODLANG)`
-replace 'OPTION="Copy" ', '' `
-replace ('<JOB ','<JOB TIMETO="TEST" ')
}
} | Set-Content ($destination_folder + $file.name)
}
Manipulating xml using regex is generally bad practice. You should use Get-Content and cast as [xml], which will allow you to manipulate the object. Check out this this MSDN demo.

Replacing bunch of string characters in a text file using Powershell

Working on a PowerShell code which will replace a set of characters from a text file in a folder (Contain lot of Text files). Is there a way where it can do it for all the files in the folder?
The issue is it creates a new file when I run the code (New_NOV_1995.txt) but it doesn't change any characters in the new file.
$lookupTable = #{
'¿' = '|'
'Ù' = '|'
'À' = '|'
'Ú' = '|'
'³' = '|'
'Ä' = '-'
}
$original_file = 'C:\FilePath\NOV_1995.txt'
$destination_file = 'C:\FilePath\NOV_1995_NEW.txt'
Get-Content -Path $original_file | ForEach-Object {
$line = $_
$lookupTable.GetEnumerator() | ForEach-Object {
if ($line -match $_.Key)
{
$line = $line -replace $_.Key, $_.Value
}
}
$line
} | Set-Content -Path $destination_file

While something like this would work, performance might be a problem. My only testing was on a tiny file containing the $lookupTable.
$lookupTable = #{
'¿' = '|'
'Ù' = '|'
'À' = '|'
'Ú' = '|'
'³' = '|'
'Ä' = '-'
}
$original_file = 'C:\FilePath\NOV_1995.txt'
$destination_file = 'C:\FilePath\NOV_1995_NEW.txt'
$originalContent = Get-Content -Path $original_file
$lookupTable.GetEnumerator() | % {
$originalContent = $originalContent -replace $_.Key,$_.Value
}
$originalContent | Out-File -FilePath $destination_file

Your code as you have it there is actually working for me. There is still a possible encoding issue maybe with your files. Does your file look right when you just read it into the console with Get-Content $path? If the file does not look right you might need to play with the -Encoding switches of the
Set-Content and Get-Contentcmdlets.
Improving on your current logic.
I changed your $lookuptable to a pair of psobjects. Since you are making the same replacement for the most part I combined them into a single regex.
The next part I hummed and hawed about but since, after my proposed change, you are only doing two replacements I figure you could just chain the two into a single replacement line. Otherwise you could have a foreach-object in there but I think this is simpler and faster.
This way we don't need to test for a match. -replace is doing the testing for us.
$toPipe = [pscustomobject]#{
Pattern = '¿|Ù|À|Ú|³'
Replacement = "|"
}
$toHypen = [pscustomobject]#{
Pattern = 'Ä'
Replacement = "-"
}
$path = "c:\temp\test\test"
Get-ChildItem -Path $path | ForEach-Object{
(Get-Content $_.FullName) -replace $toPipe.Pattern,$toPipe.Replacement -replace $toHypen.Pattern,$toHypen.Replacement |
Set-Content $_.FullName
}
Note that this will change the original files. Testing is encouraged.
Set-Content and Get-Content are not the best when it comes to performance so you might need to consider using [IO.File]::ReadAllLines($file) and its partner static method [IO.File]::WriteAllLines($file)

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Powershell: Read Text file line by line and split on "|" - powershell

Related

Powershell text procesing: Join specific lines of a txt file

Powershell 5: ConvertTo-Csv a CSV with quotes in some but not all columns

How to determine a file is tab delimited in PowerShell?

powershell replace $_ if criteria meets

Replacing bunch of string characters in a text file using Powershell

Categories

Resources