I have a restriction of not being able to encode my Powershell script file in any of the following formats
Unicode
Unicode big endian
UTF-8
I need to create some files with some non-english characters in their names.
I have found a way to achieve this.
$op = [char]24555,[char]36895,[char]30340,[char]26837,[char]33394,[char]29392,[char]29432,[char]36339,[char]36807,[char]20102,[char]25042,[char]29399
"Write some necessary information to file" | Out-File "$op"
The output here is a file named "快 速 的 棕 色 狐 狸 跳 过 了 懒 狗" with "Write some necessary information to file" as its content
There are two problems with this approach
I find my script rather awkward looking since the script can look ungainly as the value of $op gets larger. Is there any simpler way of just storing the ASCII values and then converting them to characters on the fly. I would like to avoid having to cast all those numbers to [char] individually in the array.
The name should be 快速的棕色狐狸跳过了懒狗 without the empty spaces in between.
Any easy way to achieve this ?
For the first one, you can cast the entire list to a [char[]]:
$op = [char[]]#(24555,36895,30340,26837,33394,29392,29432,36339,36807,20102,25042,29399)
To avoid the white space between characters, either change the output field separator prior to creating the string:
$OFS = ''
"$op"
or use the -join operator:
$op -join ''
Related
I have a source file which is in .txt format. It looks like a semi-colon separated file:
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
However, it is determined by length. So it is a length-delimited file.
Fist column for example is from first value to the third (100), then a semi-colon follows.
Second column starts at 5th position (including), until (including) 7th position. A string column can contain a semi-colon.
Now I want to import this length-delimited txt file with Powershell and export it as a csv file. This file should be really semi-colon separated. The result should look like
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
But I have simply no idea how to do it? I googled it, but I did not find that much useful code examples for importing length-delimited txt files with PowerShell.
Unfortunately, I cannot use Python. I am not sure, if this task is generally possible using Powershell? Because when exporting, Powershell also needs to recognize that there are string values containing the separator, so it has to pay attention to the quoting: "Thisisa;;ringcolumnB". I think it would be also ok for me, if the whole column is quoted, so every entry in a string column gets quotes added.
You can use regex to describe a string in which the 3rd "column" contains a ; and then inject the quotation marks with the -replace operator:
$lines = Get-Content path\to\file.txt
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
The expression (.{20}(?<=;.{0,19})) is going to match the 20-char 3rd column value only if it contains at least one semi-colon - so lines with no semicolon in that column will be left alone:
# let's try it out with your test data
$lines = #'
100;200;ThisisastringcolumnA;4;
101;400;Thisisastringc;lumnA;5;
102;600;ThisisastringcolumnB;6;
104;600;Thisisa;;ringcolumnB;6;
'# -split '\r?\n'
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;'
Which yields the following four strings:
100;200;ThisisastringcolumnA;4;
101;400;"Thisisastringc;lumnA";5;
102;600;ThisisastringcolumnB;6;
104;600;"Thisisa;;ringcolumnB";6;
To write the output back to file, use Set-Content:
#($lines) -replace '(.{3});(.{3});(.{20}(?<=;.{0,19}));(.);', '$1;$2;"$3";$4;' |Set-Content path\to\fixed_output.scsv
I'm trying to replicate the functionality of the following Python snippit in PowerShell:
allowed_mac_separators = [':', '-', '.']
for sep in allowed_mac_separators:
if sep in mac_address:
test = codecs.decode(mac_address.replace(sep, ''), 'hex')
b64_mac_address = codecs.encode(test, 'base64')
address = codecs.decode(b64_mac_address, 'utf-8').rstrip()
It takes a MAC address, removes the separators, converts it to hex, and then base64. (I did not write the Python function and have no control over it or how it works.)
For example, the MAC address AA:BB:CC:DD:E2:00 would be converted to AABBCCDDE200, then to b'\xaa\xbb\xcc\xdd\xe2\x00', and finally as output b'qrvM3eIA'. I tried doing something like:
$bytes = 'AABBCCDDE200' | Format-Hex
[System.BitConverter]::ToString($bytes);
but that produces MethodException: Cannot find an overload for "ToString" and the argument count: "1". and I'm not really sure what it's looking for. All the examples I've found utilizing that call only have one argument. This works:
[System.Convert]::ToBase64String([System.Text.Encoding]::UTF8.GetBytes('AABBCCDDE200'))
but obviously doesn't convert it to hex first and thus yields the incorrect result. Any help is appreciated.
# Remove everything except word characters from the string.
# In effect, this removes any punctuation ('-', ':', '.')
$sanitizedHexStr = 'AA:BB:CC:DD:E2:00' -replace '\W'
# Convert all hex-digit pairs in the string to an array of bytes.
$bytes = [byte[]] -split ($sanitizedHexStr -replace '..', '0x$& ')
# Get the Base64 encoding of the byte array.
[System.Convert]::ToBase64String($bytes)
For an explanation of the technique used to create the $bytes array, as well as a simpler PowerShell (Core) 7.1+ / .NET 5+ alternative (in short: [System.Convert]::FromHexString('AABBCCDDE200')), see this answer.
As for what you tried:
Format-Hex does not return an array of bytes (directly), its primary purpose is to visualize the input data in hex format for the human observer.
In general, Format-* cmdlets output objects whose sole purpose is to provide formatting instructions to PowerShell's output-formatting system - see this answer. In short: only ever use Format-* cmdlets to format data for display, never for subsequent programmatic processing.
That said, in the particular case of Format-Hex the output objects, which are of type [Microsoft.PowerShell.Commands.ByteCollection], do contain useful data, and do contain the bytes of the transcoded characters of input strings .Bytes property, as Cpt.Whale points out.
However, $bytes = ($sanitizedHexStr | Format-Hex).Bytes would not work in your case, because you'd effectively get byte values reflecting the ASCII code points of characters such as A (see below) - whereas what you need is the interpretation of these characters as hex digits.
But even in general I suggest not relying on Format-Hex for to-byte-array conversions:
On a philosophical note, as stated, the purpose of Format-* cmdlets is to produce for-display output, not data, and it's worth observing this distinction, this exception notwithstanding - the type of the output object could be considered an implementation detail.
Format-Hex converts strings to bytes based on first applying a fixed character transcoding (e.g., you couldn't get the byte representation of a .NET string as-is, based on UTF-16 code units), and that fixed transcoding differs between Windows PowerShell and PowerShell (Core):
In Windows PowerShell, the .NET string is transcoded to ASCII(!), resulting in the loss of non-ASCII-range characters - they are transcoded to literal ?
In PowerShell (Core), that problem is avoided by transcoding to UTF-8.
The System.BitConverter.ToString failed, because $bytes in your code wasn't itself a byte array ([byte[]]), only its .Bytes property value was (but didn't contain the values of interest).
That said, you're not looking to reconvert bytes to a string, you're looking to convert the bytes directly to Base64-encoding, as shown above.
I'm trying to convert characters in a text file based one what type they are:
Letters > L
Numbers > #
Is there a way to iterate through a file on a per-character basis? The only way I can get it to work currently is nested loops iterating through individual lines within the file. If there's a simpler way, that cuts out a lot of code I'll have to wade through.
You can use Get-Content -Encoding Byte and convert from the byte value back to a character:
Get-Content foo.txt -Encoding Byte | foreach { [char]$_ }
You can use Get-Content -Raw and cast the result to [byte[]]. Not recommended for large files.
Both options above will give you all characters, including line breaks. Option 1 will not work with Unicode for obvious reasons; option 2 will.
Then there is the variant you mention already: Iterate twice, once by lines, once by character:
Get-Content foo.txt | foreach { [char[]] $_ | foreach { ... } }
If you don't need line breaks as characters I'd prefer this version since it should have reasonable runtime and memory requirements (e.g. it won't try to fit the whole file into memory).
get-content myfile.txt | foreach { $_.ToCharArray() }
This flattens the contents of your file into a long array of characters.
If you are processing very large files, the fastest (programmatic) method I have found is to use .NET StreamReader and StreamWriter. Utilizing these objects will allow you to read line-at-a-time into a string, perform manipulation, and then write to a new file line-at-a-time. At the end, delete your original and rename the new file accordingly.
If you don't need to programmatically solve this and can utilize regular expressions, I recommend UltraEdit. I don't know what wizardry they utilize, but it is MUCH faster at reading files than what I've managed to do in PowerShell.
I am using PowerShell and I need replace a line in a .txt file.
The .txt file always has different number at the end of the line.
For example:
...............................txt (first)....................................
appversion= 10.10.1
............................txt (a second time)................................
appversion= 10.10.2
...............................txt (third)...................................
appversion= 10.10.5
I need to replace appversion + number behind it (the number is always different). I have set the required value in variable.
How do I do this?
Part of this issue you are getting, which I see from your comments, is that you are trying to replace text in a file and saved it back to the same file while you are still reading it.
I will try to show a similar solution while addressing this. Again we are going to use -replaces functionality as an array operator.
$NewVersion = "Awesome"
$filecontent = Get-Content C:\temp\file.txt
$filecontent -replace '(^appversion=.*\.).*',"`$1$NewVersion" | Set-Content C:\temp\file.txt
This regex will match lines starting with "appversion=" and everything up until the last period. Since we are storing the text in memory we can write it back to the same file. Change $NewVersion to a number ... unless that is your versioning structure.
Not sure about what numbers you are keeping
About which part of the numbers, if any, you are trying to preserve. If you intend to change the whole number then you can just .*\. to a space. That way you ignore everything after the equal sign.
Yes, you can with regex.
Let call $myString and $verNumber the variables with text and version number
$myString = "appversion= 10.10.1";
$verNumber = 7;
You can use -replace operator to get the version part and replace only last subversion number this way
$mystring -replace 'appversion= (\d+).(\d+).(\d+)', "appversion= `$1.`$2.$verNumber";
I have a binary file that I need to process, but it contains no line breaks in it.
The data is arranged, within the file, into 104 character blocks and then divided into its various fields by character count alone (no delimiting characters).
I'd like to firstly process the file, so that there is a line break (`n) every 104 characters, but after much web searching and a lot of disappointment, I've found nothing useful yet. (Unless I ditch PowerShell and use awk.)
Is there a Split option that understands character counts?
Not only would it allow me to create the file with nice easy to read lines of 104 chars, but it would also allow me to then split these lines into their component fields.
Can anyone help please, without *nix options?
Cheers :)
$s = get-content YourFileName | Out-String
$a = $s.ToCharArray()
$a[0..103] # will return an array of first 104 chars
You can get your string back the following way, the replace removes space char( which is what array element separators turn into)
$ns = ([string]$a[0..103]).replace(" ","")
Using the V4 Where method with Split option:
$text = 'abcdefghi'
While ($text)
{
$x,$text = ([char[]]$text).where({$_},'Split',3)
$x -join ''
}
abc
def
ghi