Write string to Unicode File - unicode

I am using NSIS Unicode version and I am trying to append a string to an existing Unicode file(UTF-16LE).
My Problem: After I write the string to the file then open the file, the string I wrote is just jibberish. I have a feeling that its trying to write an ANSI string to a UTF-16LE file.
How can I write a string to a unicode file?
Function ${prefix}AppendFile
# Note: Will automatically create file if it doesn't exist
# $0 = fName
# $1 = strToWrite
Pop $1
Pop $0
ClearErrors
FileOpen $3 $0 a
FileSeek $3 0 END
FileWrite $3 "$\r$\n" # write a new line
FileWrite $3 "$1"
FileWrite $3 "$\r$\n" # write an extra line
FileClose $3 # close the file
IfErrors 0 +2
MessageBox MB_OK "Append Error: $1 $\r$\n$\r$\n$0"
FunctionEnd

If you're dealing with UTF-16LE file, you need to use FileWriteUTF16LE, which writes Unicode text, rather than FileWrite, which writes ANSI text.

Related

Decode binary octet string in a file with perl

I have a file that contains for some of the lines a number that is coded as text -> binary -> octets and I need to decode that to end up with the number.
All the lines where this encoded string is, begins with STRVID:
For example I have in one of the lines:
STRVID: SarI3gXp
If I do this echo "SarI3gXp" | perl -lpe '$_=unpack"B*"' I get the number in binary
0101001101100001011100100100100100110011011001110101100001110000
Now just to decode from binary to octets I do this (assign the previous command to a variable and then convert binary to octets
variable=$(echo "SarI3gXp" | perl -lpe '$_=unpack"B*"') ; printf '%x\n' "$((2#$variable))"
The result is the number but not in the correct order
5361724933675870
To get the previous number in the correct order I have to get for each couple of digits first the second digit and then the first digit to finally have the number I'm looking for. Something like this:
variable=$(echo "SarI3gXp" | perl -lpe '$_=unpack"B*"') ; printf '%x\n' "$((2#$variable))" | gawk 'BEGIN {FS = ""} {print $2 $1 $4 $3 $6 $5 $8 $7 $10 $9 $12 $11 $14 $13 $16 $15}'
And finally I have the number I'm looking for:
3516279433768507
I don't have any clue on how to do this automatically for every line that begins with STRVID: in my file. At the end what I need is the whole file but when a line begins with STRVID: then the decoded value.
When I find this:
STRVID: SarI3gXp
I will have in my file
STRVID: 3516279433768507
Can someone help with this?
First of all, all you need for the conversion is
unpack "h*", "SarI3gXp"
A perl one-liner using -p will execute the provided program for each line, and s///e allows us to modify a string with code as the replacement expression.
perl -pe's/^STRVID:\s*\K\S+/ unpack "h*", $& /e'
See Specifying file to process to Perl one-liner.
Please inspect the following sample demo code snippet for compliance with your problem.
You do not need double conversion when it can be done in one go.
Note: please read pack documentation , unpack utilizes same TEMPLATE
use strict;
use warnings;
use feature 'say';
while( <DATA> ) {
chomp;
/^STRVID: (.+)/
? say 'STRVID: ' . unpack("h*",$1)
: say;
}
__DATA__
It would be nice if you provide proper input data sample
STRVID: SarI3gXp
Perhaps the result of this script complies with your requirements.
To work with real input data file replace
while( <DATA> ) {
with
while( <> ) {
and pass filename as an argument to the script.
Output
It would be nice if you provide proper input data sample
STRVID: 3516279433768507
Perhaps the result of this script complies with your requirements.
To work with real input data file replace
while( <DATA> ) {
with
while( <> ) {
and pass filename as an argument to the script.
./script.pl input_file.dat
you can cross flip the numbers entirely via regex (and without back-references either) :
variable=$(echo "SarI3gXp" | perl -lpe '$_=unpack"B*"') ;
printf '%x\n' "$((2#$variable))" |
mawk -F'^$' 'gsub("..", "_&=&_") + gsub(\
"(^|[0-9]_)(_[0-9]|$)", _)+gsub("=",_)^_'
1 3516279433768507
The idea is to make a duplicate copy on the other side, like this :
_53=53__61=61__72=72__49=49__33=33__67=67__58=58__70=70_
then scrub out the leftovers, since the numbers u now want are anchoring the 2 sides of each equal sign ("=")

How to encode string in NSIS to UTF-16LE format?

Hi I'm trying to replicate this code in Python for NSIS installer.
m = hashlib.md5("C:\PROGRAM FILES\My Program".encode('utf-16LE'))
It basically encode the string , then apply MD5 hash to it. I have found the MD5 hash plug-in for NSIS. However, I still can't figure out how to convert the string in $0 to a utf-16LE format.
Thank you
If you are building a Unicode installer you can use the Crypto plug-in and feed it the string directly:
Unicode True
...
Section
Crypto::HashUTF16LE MD5 "The quick brown fox jumps over the lazy dog"
Pop $0
DetailPrint $0 ; B0986AE6EE1EEFEE8A4A399090126837
SectionEnd
ANSI installers have to write the content to a file and hash the file:
Section
InitPluginsDir
StrCpy $1 "The quick brown fox jumps over the lazy dog"
StrLen $3 $1
IntOp $3 $3 * 2 ; UTF-16 is 2 bytes per code-unit
FileOpen $2 "$PluginsDir\Temp.txt" w
System::Call 'KERNEL32::WriteFile(pr2,wr1,ir3,*i,p0)' ; This converts the string for us
FileClose $2
Crypto::HashFile MD5 "$PluginsDir\Temp.txt"
Pop $0
DetailPrint $0
SectionEnd

hash using sha1sum using awk

I have a "pipe-separated" file that has about 20 columns. I want to just hash the first column which is a number like account number using sha1sum and return the rest of the columns as is.
Whats the best way I can do this using awk or sed?
Accountid|Time|Category|.....
8238438|20140101021301|sub1|...
3432323|20140101041903|sub2|...
9342342|20140101050303|sub1|...
Above is an example of the text file showing just 3 columns. Only the first column has the hashfunction implemented on it. Result should like:
Accountid|Time|Category|.....
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
What the Best Way™ is is up for debate. One way to do it with awk is
awk -F'|' 'BEGIN { OFS=FS } NR == 1 { print } NR != 1 { gsub(/'\''/, "'\'\\\\\'\''", $1); command = ("echo '\''" $1 "'\'' | sha1sum -b | cut -d\\ -f 1"); command | getline hash; close(command); $1 = hash; print }' filename
That is
BEGIN {
OFS = FS # set output field separator to field separator; we will use
# it because we meddle with the fields.
}
NR == 1 { # first line: just print headers.
print
}
NR != 1 { # from there on do the hash/replace
# this constructs a shell command (and runs it) that echoes the field
# (singly-quoted to prevent surprises) through sha1sum -b, cuts out the hash
# and gets it back into awk with getline (into the variable hash)
# the gsub bit is to prevent the shell from barfing if there's an apostrophe
# in one of the fields.
gsub(/'/, "'\\''", $1);
command = ("echo '" $1 "' | sha1sum -b | cut -d\\ -f 1")
command | getline hash
close(command)
# then replace the field and print the result.
$1 = hash
print
}
You will notice the differences between the shell command at the top and the awk code at the bottom; that is all due to shell expansion. Because I put the awk code in single quotes in the shell commands (double quotes are not up for debate in that context, what with $1 and all), and because the code contains single quotes, making it work inline leads to a nightmare of backslashes. Because of this, my advice is to put the awk code into a file, say foo.awk, and run
awk -F'|' -f foo.awk filename
instead.
Here's an awk executable script that does what you want:
#!/usr/bin/awk -f
BEGIN { FS=OFS="|" }
FNR != 1 { $1 = encodeData( $1 ) }
47
function encodeData( fld ) {
cmd = sprintf( "echo %s | sha1sum", fld )
cmd | getline output
close( cmd )
split( output, arr, " " )
return arr[1]
}
Here's the flow break down:
Set the input and output field separators to |
When the row isn't the first (header) row, re-assign $1 to an encoded value
Print the entire row when 47 is true (always)
Here's the encodeData function break down:
Create a cmd to feed data to sha1sum
Feed it to getline
Close the cmd
On my system, there's extra info after sha1sum, so I discard it by spliting the output
Return the first field of the sha1sum output.
With your data, I get the following:
Accountid|Time|Category|.....
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
running by calling awk.script data (or ./awk.script data if you bash)
EDIT by EdMorton:
sorry for the edit, but your script above is the right approach but needs some tweaks to make it more robust and this is much easier than trying to describe them in a comment:
$ cat tst.awk
BEGIN { FS=OFS="|" }
NR==1 { for (i=1; i<=NF; i++) f[$i] = i; next }
{ $(f["Accountid"]) = encodeData($(f["Accountid"])); print }
function encodeData( fld, cmd, output ) {
cmd = "echo \047" fld "\047 | sha1sum"
if ( (cmd | getline output) > 0 ) {
sub(/ .*/,"",output)
}
else {
print "failed to hash " fld | "cat>&2"
output = fld
}
close( cmd )
return output
}
$ awk -f tst.awk file
104a1f34b26ae47a67273fe06456be1fe97f75ba|20140101021301|sub1|...
c84270c403adcd8aba9484807a9f1c2164d7f57b|20140101041903|sub2|...
4fa518d8b005e4f9a085d48a4b5f2c558c8402eb|20140101050303|sub1|...
The f[] array decouples your script from hard-coding the number of the field that needs to be hashed, the additional args for your function make them local and so always null/zero on each invocation, the if on getline means you won't return the previous success value if it fails (see http://awk.info/?tip/getline) and the rest is maybe more style/preference with a bit of a performance improvement.

Swap two columns - awk, sed, python, perl

I've got data in a large file (280 columns wide, 7 million lines long!) and I need to swap the first two columns. I think I could do this with some kind of awk for loop, to print $2, $1, then a range to the end of the file - but I don't know how to do the range part, and I can't print $2, $1, $3...$280! Most of the column swap answers I've seen here are specific to small files with a manageable number of columns, so I need something that doesn't depend on specifying every column number.
The file is tab delimited:
Affy-id chr 0 pos NA06984 NA06985 NA06986 NA06989
You can do this by swapping values of the first two fields:
awk ' { t = $1; $1 = $2; $2 = t; print; } ' input_file
I tried the answer of perreal with cygwin on a windows system with a tab separated file. It didn't work, because the standard separator is space.
If you encounter the same problem, try this instead:
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file
Incoming separator is defined by -F $'\t' and the seperator for output by OFS=$'\t'.
awk -F $'\t' ' { t = $1; $1 = $2; $2 = t; print; } ' OFS=$'\t' input_file > output_file
Try this more relevant to your question :
awk '{printf("%s\t%s\n", $2, $1)}' inputfile
This might work for you (GNU sed):
sed -i 's/^\([^\t]*\t\)\([^\t]*\t\)/\2\1/' file
Have you tried using the cut command? E.g.
cat myhugefile | cut -c10-20,c1-9,c21- > myrearrangedhugefile
This is also easy in perl:
perl -pe 's/^(\S+)\t(\S+)/$2\t$1/;' file > outputfile
You could do this in Perl:
perl -F\\t -nlae 'print join("\t", #F[1,0,2..$#F])' inputfile
The -F specifies the delimiter. In most shells you need to precede a backslash with another to escape it. On some platforms -F automatically implies -n and -a so they can be dropped.
For your problem you wouldn't need to use -l because the last columns appears last in the output. But if in a different situation, if the last column needs to appear between other columns, the newline character must be removed. The -l switch takes care of this.
The "\t" in join can be changed to anything else to produce a different delimiter in the output.
2..$#F specifies a range from 2 until the last column. As you might have guessed, inside the square brackets, you can put any single column or range of columns in the desired order.
No need to call anything else but your shell:
bash> while read col1 col2 rest; do
echo $col2 $col1 $rest
done <input_file
Test:
bash> echo "first second a c d e f g" |
while read col1 col2 rest; do
echo $col2 $col1 $rest
done
second first a b c d e f g
Maybe even with "inlined" Python - as in a Python script within a shell script - but only if you want to do some more scripting with Bash beforehand or afterwards... Otherwise it is unnecessarily complex.
Content of script file process.sh:
#!/bin/bash
# inline Python script
read -r -d '' PYSCR << EOSCR
from __future__ import print_function
import codecs
import sys
encoding = "utf-8"
fn_in = sys.argv[1]
fn_out = sys.argv[2]
# print("Input:", fn_in)
# print("Output:", fn_out)
with codecs.open(fn_in, "r", encoding) as fp_in, \
codecs.open(fn_out, "w", encoding) as fp_out:
for line in fp_in:
# split into two columns and rest
col1, col2, rest = line.split("\t", 2)
# swap columns in output
fp_out.write("{}\t{}\t{}".format(col2, col1, rest))
EOSCR
# ---------------------
# do setup work?
# e. g. list files for processing
# call python script with params
python3 -c "$PYSCR" "$inputfile" "$outputfile"
# do some more processing
# e. g. rename outputfile to inputfile, ...
If you only need to swap the columns for a single file, then you can also just create a single Python script and statically define the filenames. Or just use an answer above.
awk swapping sans temp-variable :
echo '777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541' |
mawk '1; ($1 = $2 substr(_, ($2 = $1)^_))^_' FS=':' OFS=':'
777777744444444464449: 317 647 14423 262927714037 : 0x2A29D5A1BAA7A95541
317 647 14423 262927714037 :777777744444444464449: 0x2A29D5A1BAA7A95541

editing text files with perl

I'm trying to edit a text file that looks like this:
TYPE=Ethernet
HWADDR=00:....
IPV6INIT=no
MTU=1500
IPADDR=192.168.2.247
...
(Its actually the /etc/sysconfig/network-scripts/ifcfg- file on red hat Linux)
Instead of reading and rewriting the file each time I want to modify it, I figured I could use grep, sed, awk or the native text parsing functionality provided in Perl.
For instance, if I wanted to change the IPADDR field of the file, is there a way I can just retrieve and modify the line directly? Maybe something like
grep 'IPADDR=' <filename>
but add some additional arguments to modify that line? I'm a little new to UNIX based text processing languages so bear with me...
Thanks!
Here's a Perl oneliner to replace the IPADDR value with the IP address 127.0.01. It's short enough that you should be able to see what you need to modify to alter other fields*:
perl -p -i.orig -e 's/^IPADDR=.*$/IPADDR=127.0.0.1/' filename
It will rename "filename" to "filename.orig", and write out the new version of the file into "filename".
Perl command-line options are explained at perldoc perlrun (thanks for the reminder toolic!), and the syntax of perl regular expressions is at perldoc perlre.
*The regular expression ^IPADDR=.*$, split into components, means:
^ # bind to the beginning of the line
IPADDR= # plain text: match "IPADDR="
.* # followed by any number of any character (`.` means "any one character"; `*` means "any number of them")
$ # bind to the end of the line
since you are on redhat, you can try using the shell
#!/bin/bash
file="file"
read -p "Enter field to change: " field
read -p "Enter new value: " newvalue
shopt -s nocasematch
while IFS="=" read -r f v
do
case "$f" in
$field)
v=$newvalue;;
esac
echo "$f=$v"
done <$file > temp
mv temp file
UPDATE:
file="file"
read -p "Enter field to change: " field
read -p "Enter new value: " newvalue
shopt -s nocasematch
EOL=false
IFS="="
until $EOL
do
read -r f v || EOL=true
case "$f" in
$field)
v=$newvalue;;
esac
echo "$f=$v"
done <$file #> temp
#mv temp file
OR , using just awk
awk 'BEGIN{
printf "Enter field to change: "
getline field < "-"
printf "Enter new value: "
getline newvalue <"-"
IGNORECASE=1
OFS=FS="="
}
field == $1{
$2=newvalue
}
{
print $0 > "temp"
}END{
cmd="mv temp "FILENAME
system(cmd)
}' file
Or with Perl
printf "Enter field: ";
chomp($field=<STDIN>);
printf "Enter new value: ";
chomp($newvalue=<STDIN>);
while (<>){
my ( $f , $v ) = split /=/;
if ( $field =~ /^$f/i){
$v=$newvalue;
}
print join("=",$f,$v);
}
That would be the 'ed' command line editor, like sed but will put the file back where it came from.