Import length-delimited file with PowerShell and export as csv file

Import length-delimited file with PowerShell and export as csv file - powershell

I have a source file which is in .txt format. It looks like a semi-colon separated file:
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
However, it is determined by length. So it is a length-delimited file.
Fist column for example is from first value to the third (100), then a semi-colon follows.
Second column starts at 5th position (including), until (including) 7th position. A string column can contain a semi-colon.
Now I want to import this length-delimited txt file with Powershell and export it as a csv file. This file should be really semi-colon separated. The result should look like
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
But I have simply no idea how to do it? I googled it, but I did not find that much useful code examples for importing length-delimited txt files with PowerShell.
Unfortunately, I cannot use Python. I am not sure, if this task is generally possible using Powershell? Because when exporting, Powershell also needs to recognize that there are string values containing the separator, so it has to pay attention to the quoting: "Thisisa;;ringcolumnB". I think it would be also ok for me, if the whole column is quoted, so every entry in a string column gets quotes added.

You can use regex to describe a string in which the 3rd "column" contains a ; and then inject the quotation marks with the -replace operator:
$lines = Get-Content path\to\file.txt
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
The expression (.{20}(?<=;.{0,19})) is going to match the 20-char 3rd column value only if it contains at least one semi-colon - so lines with no semicolon in that column will be left alone:
# let's try it out with your test data
$lines = #'
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
'# -split '\r?\n'
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
Which yields the following four strings:
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
To write the output back to file, use Set-Content:
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;' |Set-Content path\to\fixed_output.scsv

Related

String variable position being overwritten in write-host

If I run the below code, $SRN can be written as output or added to another variable, but trying to include either another variable or regular text causes it to be overwritten from the beginning of the line. I'm assuming it's something to do with how I'm assigning $autocode and $SRN initially but can't tell what it's trying to do.
# Load the property set to allow us to get to the email body.
$item.load($psPropertySet) # Load the data.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\n" # Get the body text, remove blank lines, split on line breaks to create an array (otherwise it is a single string).
$autocode = $bod[4].split('-')[2] # Get line 4 (should be Title), split on dash, look for 3rd element, this should contain our automation code.
$SRN = $bod[1] -replace 'ID: ','' # Get line 2 (should be ID), find and replace the preceding text.
# Skip processing if autocode does not match our list of handled ones.
if ($autocode -cin $autocodes)
{
write-host "$SRN $autocode"
write-host "$autocode $SRN"
write-host "$SRN test"
$var = "$SRN $autocode"
$var
}
The code results in this, you can see if $SRN isn't at the start of the line it is fine. Unsure where the extra spaces come from either:
KRNE8385
KRNE SR1788385
test8385
KRNE8385
I would expect to see this:
SR1788385 KRNE
KRNE SR1788385
SR1788385 test
SR1788385 KRNE

LotPings pointed me down the right path, both variables still had either "0D" or "\r" in them. My regex replace was only getting rid of them on blank lines, and I split the array on "\n" only. Changing line 3 in the original code to the below appears to have resolved the issue. First time seeing Format-Hex, but it appears to be excellent for troubleshooting such issues.
$bod = ($item.Body.Text -creplace '(?m)^\s*\r?\n','') -split "\r\n"

Powershell - easy way to convert an array of ASCII values to characters

I have a restriction of not being able to encode my Powershell script file in any of the following formats
Unicode
Unicode big endian
UTF-8
I need to create some files with some non-english characters in their names.
I have found a way to achieve this.
$op = [char]24555,[char]36895,[char]30340,[char]26837,[char]33394,[char]29392,[char]29432,[char]36339,[char]36807,[char]20102,[char]25042,[char]29399
"Write some necessary information to file" | Out-File "$op"
The output here is a file named "快 速 的 棕 色 狐 狸 跳 过 了 懒 狗" with "Write some necessary information to file" as its content
There are two problems with this approach
I find my script rather awkward looking since the script can look ungainly as the value of $op gets larger. Is there any simpler way of just storing the ASCII values and then converting them to characters on the fly. I would like to avoid having to cast all those numbers to [char] individually in the array.
The name should be 快速的棕色狐狸跳过了懒狗 without the empty spaces in between.
Any easy way to achieve this ?

For the first one, you can cast the entire list to a [char[]]:
$op = [char[]]#(24555,36895,30340,26837,33394,29392,29432,36339,36807,20102,25042,29399)
To avoid the white space between characters, either change the output field separator prior to creating the string:
$OFS = ''
"$op"
or use the -join operator:
$op -join ''

Replace first two characters of each line of a file via PowerShell

I have a file that needs to have the first two characters of each line replaced. It seems easy but those same first two characters "|0" showup elsewhere in the file. So I've ended up having the replacement strings "$bp" all over the place. Any way to just replace the first instance of "|0" for each line only? Here is the sample data:
0|Corrupt Record|0|0|0|0|0|0|0|0|0

Your question is unclear (|0 vs 0|).
You can use this snippet to replace the 2 first characters of each line if they are 0|:
$oldContent = Get-Content "my/file"
$newContent = $OldContent | ForEach-Object { $_ -replace "^0\|","newstring" }
# simpler
#$newContent = $OldContent -replace "^0\|","newstring"
$newContent | Set-Content "my/file"

I'm sure there are other ways to do this, but here is how my approach would be.
To replace just the first occurrence of "0|" and have the remaining stay you can replace it like so.
$CorruptString = "0|Corrupt Record|0|0|0|0|0|0|0|0|0"
[regex]$ToReplace = "0\|"
$ToReplace.replace($CorruptString, "", 1)
This will Output:
Corrupt Record|0|0|0|0|0|0|0|0|0
Just a simple regex to replace the corrupt string and replace it with either nothing or whatever you wanted to replace it with. Naturally the 1 is so it only does it one time.
I believe that is what you were looking for. If not try to explain more.
EDIT: because there was some confusion with the post. To replace the first two characters in a string you can just do substring to remove the first two.
"0|Corrupt Record|0|0|0|0|0|0|0|0|0".Substring(2)

powershell - replace line in .txt file

I am using PowerShell and I need replace a line in a .txt file.
The .txt file always has different number at the end of the line.
For example:
...............................txt (first)....................................
appversion= 10.10.1
............................txt (a second time)................................
appversion= 10.10.2
...............................txt (third)...................................
appversion= 10.10.5
I need to replace appversion + number behind it (the number is always different). I have set the required value in variable.
How do I do this?

Part of this issue you are getting, which I see from your comments, is that you are trying to replace text in a file and saved it back to the same file while you are still reading it.
I will try to show a similar solution while addressing this. Again we are going to use -replaces functionality as an array operator.
$NewVersion = "Awesome"
$filecontent = Get-Content C:\temp\file.txt
$filecontent -replace '(^appversion=.*\.).*',"`$1$NewVersion" | Set-Content C:\temp\file.txt
This regex will match lines starting with "appversion=" and everything up until the last period. Since we are storing the text in memory we can write it back to the same file. Change $NewVersion to a number ... unless that is your versioning structure.
Not sure about what numbers you are keeping
About which part of the numbers, if any, you are trying to preserve. If you intend to change the whole number then you can just .*\. to a space. That way you ignore everything after the equal sign.

Yes, you can with regex.
Let call $myString and $verNumber the variables with text and version number
$myString = "appversion= 10.10.1";
$verNumber = 7;
You can use -replace operator to get the version part and replace only last subversion number this way
$mystring -replace 'appversion= (\d+).(\d+).(\d+)', "appversion= `$1.`$2.$verNumber";

Postgresql: CSV export with escaped linebreaks

I exported some data from a postgresql database using (all) the instruction(s) posted here: Save PL/pgSQL output from PostgreSQL to a CSV file
But some exported fields contains newlines (linebreaks), so I got a CSV file like:
header1;header2;header3
foobar;some value;other value
just another value;f*** value;value with
newline
nextvalue;nextvalue2;nextvalue3
How can I escape (or ignore) theese newline character(s)?

Line breaks are supported in CSV if the fields that contain them are enclosed in double quotes.
So if you had this in the middle of the file:
just another value;f*** value;"value with
newline"
it will be taken as 1 line of data spread on 2 lines with 3 fields and just work.
On the other hand, without the double quotes, it's an invalid CSV file (when it advertises 3 fields).
Although there's no formal specification for the CSV format, you may look at RFC 4180 for the rules that generally apply.