Read from text file one character at a time - powershell

I'm trying to convert characters in a text file based one what type they are:
Letters > L
Numbers > #
Is there a way to iterate through a file on a per-character basis? The only way I can get it to work currently is nested loops iterating through individual lines within the file. If there's a simpler way, that cuts out a lot of code I'll have to wade through.

You can use Get-Content -Encoding Byte and convert from the byte value back to a character:
Get-Content foo.txt -Encoding Byte | foreach { [char]$_ }
You can use Get-Content -Raw and cast the result to [byte[]]. Not recommended for large files.
Both options above will give you all characters, including line breaks. Option 1 will not work with Unicode for obvious reasons; option 2 will.
Then there is the variant you mention already: Iterate twice, once by lines, once by character:
Get-Content foo.txt | foreach { [char[]] $_ | foreach { ... } }
If you don't need line breaks as characters I'd prefer this version since it should have reasonable runtime and memory requirements (e.g. it won't try to fit the whole file into memory).

get-content myfile.txt | foreach { $_.ToCharArray() }
This flattens the contents of your file into a long array of characters.

If you are processing very large files, the fastest (programmatic) method I have found is to use .NET StreamReader and StreamWriter. Utilizing these objects will allow you to read line-at-a-time into a string, perform manipulation, and then write to a new file line-at-a-time. At the end, delete your original and rename the new file accordingly.
If you don't need to programmatically solve this and can utilize regular expressions, I recommend UltraEdit. I don't know what wizardry they utilize, but it is MUCH faster at reading files than what I've managed to do in PowerShell.

Related

replacing text in Powershell every alternate match

I have looked at this question, and it's close to what I need to do, but the text I need to replace is inconsistent.
I need to replace "`r`n with ", but only the first of the 2 adjacent lines
example: (the full file is 50k lines and up to 500 chars wide)
ID,Name,LinkedRecords
54429,Abe,
54247,Jonathan,"
63460|63461"
54249,Teresa,
54418,Cody,
58046,Joseph,
58243,David,
,Barry,"
74330"
C8876,Simon,
X_10934,David,
should become
ID,Name,LinkedRecords
54429,Abe,
54247,Jonathan,"63460|63461"
54249,Teresa,
54418,Cody,
58046,Joseph,
58243,David,
,Barry,"74330"
C8876,Simon,
X_10934,David,
I can see this will probably be useful, but I'm having a hard time getting the command to work as desired
If the `r`n characters are literal, then you can do the following:
[System.IO.File]::ReadAllText('c:\path\file.txt') -replace '(?<=,")`r`n\r?\n' |
Set-Content c:\path\file.txt
If `r`n are actual carriage return and line feed chars, then you can do the following:
[System.IO.File]::ReadAllText('c:\path\file.txt') -replace '(?<=,")\r\n' |
Set-Content c:\path\file.txt
Note if memory becomes an issue, a different approach may be needed.

Powershell - easy way to convert an array of ASCII values to characters

I have a restriction of not being able to encode my Powershell script file in any of the following formats
Unicode
Unicode big endian
UTF-8
I need to create some files with some non-english characters in their names.
I have found a way to achieve this.
$op = [char]24555,[char]36895,[char]30340,[char]26837,[char]33394,[char]29392,[char]29432,[char]36339,[char]36807,[char]20102,[char]25042,[char]29399
"Write some necessary information to file" | Out-File "$op"
The output here is a file named "快 速 的 棕 色 狐 狸 跳 过 了 懒 狗" with "Write some necessary information to file" as its content
There are two problems with this approach
I find my script rather awkward looking since the script can look ungainly as the value of $op gets larger. Is there any simpler way of just storing the ASCII values and then converting them to characters on the fly. I would like to avoid having to cast all those numbers to [char] individually in the array.
The name should be 快速的棕色狐狸跳过了懒狗 without the empty spaces in between.
Any easy way to achieve this ?
For the first one, you can cast the entire list to a [char[]]:
$op = [char[]]#(24555,36895,30340,26837,33394,29392,29432,36339,36807,20102,25042,29399)
To avoid the white space between characters, either change the output field separator prior to creating the string:
$OFS = ''
"$op"
or use the -join operator:
$op -join ''

Replace first two characters of each line of a file via PowerShell

I have a file that needs to have the first two characters of each line replaced. It seems easy but those same first two characters "|0" showup elsewhere in the file. So I've ended up having the replacement strings "$bp" all over the place. Any way to just replace the first instance of "|0" for each line only? Here is the sample data:
0|Corrupt Record|0|0|0|0|0|0|0|0|0
Your question is unclear (|0 vs 0|).
You can use this snippet to replace the 2 first characters of each line if they are 0|:
$oldContent = Get-Content "my/file"
$newContent = $OldContent | ForEach-Object { $_ -replace "^0\|","newstring" }
# simpler
#$newContent = $OldContent -replace "^0\|","newstring"
$newContent | Set-Content "my/file"
I'm sure there are other ways to do this, but here is how my approach would be.
To replace just the first occurrence of "0|" and have the remaining stay you can replace it like so.
$CorruptString = "0|Corrupt Record|0|0|0|0|0|0|0|0|0"
[regex]$ToReplace = "0\|"
$ToReplace.replace($CorruptString, "", 1)
This will Output:
Corrupt Record|0|0|0|0|0|0|0|0|0
Just a simple regex to replace the corrupt string and replace it with either nothing or whatever you wanted to replace it with. Naturally the 1 is so it only does it one time.
I believe that is what you were looking for. If not try to explain more.
EDIT: because there was some confusion with the post. To replace the first two characters in a string you can just do substring to remove the first two.
"0|Corrupt Record|0|0|0|0|0|0|0|0|0".Substring(2)

powershell - replace line in .txt file

I am using PowerShell and I need replace a line in a .txt file.
The .txt file always has different number at the end of the line.
For example:
...............................txt (first)....................................
appversion= 10.10.1
............................txt (a second time)................................
appversion= 10.10.2
...............................txt (third)...................................
appversion= 10.10.5
I need to replace appversion + number behind it (the number is always different). I have set the required value in variable.
How do I do this?
Part of this issue you are getting, which I see from your comments, is that you are trying to replace text in a file and saved it back to the same file while you are still reading it.
I will try to show a similar solution while addressing this. Again we are going to use -replaces functionality as an array operator.
$NewVersion = "Awesome"
$filecontent = Get-Content C:\temp\file.txt
$filecontent -replace '(^appversion=.*\.).*',"`$1$NewVersion" | Set-Content C:\temp\file.txt
This regex will match lines starting with "appversion=" and everything up until the last period. Since we are storing the text in memory we can write it back to the same file. Change $NewVersion to a number ... unless that is your versioning structure.
Not sure about what numbers you are keeping
About which part of the numbers, if any, you are trying to preserve. If you intend to change the whole number then you can just .*\. to a space. That way you ignore everything after the equal sign.
Yes, you can with regex.
Let call $myString and $verNumber the variables with text and version number
$myString = "appversion= 10.10.1";
$verNumber = 7;
You can use -replace operator to get the version part and replace only last subversion number this way
$mystring -replace 'appversion= (\d+).(\d+).(\d+)', "appversion= `$1.`$2.$verNumber";

Powershell break a file up on character count

I have a binary file that I need to process, but it contains no line breaks in it.
The data is arranged, within the file, into 104 character blocks and then divided into its various fields by character count alone (no delimiting characters).
I'd like to firstly process the file, so that there is a line break (`n) every 104 characters, but after much web searching and a lot of disappointment, I've found nothing useful yet. (Unless I ditch PowerShell and use awk.)
Is there a Split option that understands character counts?
Not only would it allow me to create the file with nice easy to read lines of 104 chars, but it would also allow me to then split these lines into their component fields.
Can anyone help please, without *nix options?
Cheers :)
$s = get-content YourFileName | Out-String
$a = $s.ToCharArray()
$a[0..103] # will return an array of first 104 chars
You can get your string back the following way, the replace removes space char( which is what array element separators turn into)
$ns = ([string]$a[0..103]).replace(" ","")
Using the V4 Where method with Split option:
$text = 'abcdefghi'
While ($text)
{
$x,$text = ([char[]]$text).where({$_},'Split',3)
$x -join ''
}
abc
def
ghi