Can Notepad read UTF-32? - unicode

These bytes represent the word "hi" in UTF-32LE:
FF FE 00 00 68 00 00 00 69 00 00 00
However this is what Notepad displays:

Notepad does not support UTF-32, only ANSI, UTF-8, and UTF-16. It is interpreting the first 2 bytes as a UTF-16LE BOM, not the first 4 bytes as a UTF-32LE BOM, so the file bytes get interpreted as
FF FE 00 00 68 00 00 00 69 00 00 00
Instead of
FF FE 00 00 68 00 00 00 69 00 00 00

Related

How to filter not simple binary data column in PySpark?

Sample of my df is:
+------------------------------------------------------------------------------------------------------------------+
|id | binary_col |
+------------------------------------------------------------------------------------------------------------------+
| 1 | [08 01 10 0D 00 0E CC 93 01 00 00 00 01 00 00 00 00 00 00 00 80 FF BF 40 00 00 00 00 00 00 F0 3F BE 2B 00 00]|
| 2 | [08 01 10 0D 00 0E CC 93 01 00 00 00 01 00 00 00 00 00 00 00 F0 FF BF 40 00 00 00 00 00 00 F0 3F 57 66 00 00]|
| 3 | [08 01 10 0D 00 0E CC 93 01 00 00 00 01 00 00 00 00 00 00 00 C0 FF BF 40 00 00 00 00 00 00 F0 3F D5 69 00 00]|
| 4 | [08 01 10 0D 00 0E CC 93 01 00 00 00 01 00 00 00 00 00 00 00 80 FF BF 40 00 00 00 00 00 00 F0 3F 5A 60 00 00]|
+------------------------------------------------------------------------------------------------------------------+
with these schema (df.printSchema())
|-- id: int (nullable = true)
|-- binary_col: binary (nullable = true)
And I want to filter only the values with [08 01 10 0D 00 0E CC 93 01 00 00 00 01 00 00 00 00 00 00 00 80 FF BF 40 00 00 00 00 00 00 F0 3F BE 2B 00 00] (It doesn't work filtering id=1 because there are other ids in the df)
I've tried to cast binary to bigint to filter later like here: Spark: cast bytearray to bigint
by doing df.withColumn('casted_bin', F.conv(F.hex(F.col("binary_col")), 16, 10).cast("bigint")).show(truncate=False) but it didn't work.
How can I filter any kind of binary data type?
Note: I had asked previously here (How to filter Pyspark column with binary data type?) but it was a very simple binary data and the answer generated the binary from a numeric value while now I don't know how to generate the numeric value.

How to import a key into a MUSCLE card?

I am trying to import a key into the card, but it is giving response as 6F00 (UNKNOWN ERROR).The procedure i followed to import a key is
Load the (MUSCLE) applet
Initialize the applet
Verify the pin
create the object with id (FF FF FF FE):
-> B0 5A 00 00 0E FF FF FF FE 00 00 00 44 00 00 00 00 00 00 00
<- 90 00
write into the object
-> B0 54 00 00 8D FF FF FF FE 00 00 00 00 84 00 01 00 80 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
<- 90 00
Import key
-> B0 32 04 00 07 00 00 FF FF 00 00 00 00
<- 6F 00
Please provide a solution for the above problem.
If you are still looking for the solution: 7 bytes seems to be a bit high for importing a key... ;)
The ACL in the data block is only six bytes, so this might cause your error. The following "optional parameters" are AFAIK completely unused.

Shell magic wanted: format output of hexdump in a pipe

I'm debugging the output of a program that transmits data via TCP.
For debugging purposes i've replaced the receiving program with netcat and hexdump:
netcat -l -p 1234 | hexdump -C
That outputs all data as a nice hexdump, almost like I want. Now the data is transmitted in fixed blocks which lengths are not multiples of 16, leading to shifted lines in the output that make spotting differences a bit difficult:
00000000 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00000010 00 50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |.P..............|
00000020 00 00 50 00 00 00 00 00 00 00 00 00 00 00 00 00 |..P.............|
How do I reformat the output so that after 17 bytes a new line is started?
It should look something like this:
50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00 |. |
50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00 |. |
50 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |P...............|
00 |. |
Using hexdumps -n parameter does not work since it will exit after reaching the number of bytes. (Unless there is a way to keep the netcat programm running and seamlessly piping the next bytes to a new instance of hexdump).
Also it would be great if I could use watch -d on the output to get a highlight of changes between lines.
For hexdump without characters part.
hexdump -e '16/1 "%0.2x " "\n" 1/1 "%0.2x " "\n"'
I use this:
use strict;
use warnings;
use bytes;
my $N = $ARGV[0];
$/ = \$N;
while (<STDIN>) {
my #bytes = unpack("C*", $_);
my $clean = $_;
$clean =~ s/[[:^print:]]/./g;
print join(' ', map {sprintf("%2x", $_)} #bytes),
" |", $clean, "|\n";
}
Run it as perl scriptname.pl N where N is the number of bytes in each chunk you want.
also you can use xxd -p to make a hexdump .

Extracting "plaintext" header from HEX file using Perl

Have a file that appears to have plaintext headers in them that I would like to extract and convert to plaintext.
Using HEXedit, this is what I'm seeing, which is in a file:
3a40 - 31 65 33 38 00 00 00 00 00 00 00 00 00 00 00 00 - 1e38............
3a50 - 00 00 00 00 00 00 00 00 00 00 0a 00 74 00 65 00 - ............t.e.
3a60 - 78 00 74 00 2f 00 61 00 73 00 63 00 69 00 69 00 - x.t./.a.s.c.i.i.
3a70 - 00 00 18 00 61 00 66 00 66 00 79 00 6d 00 65 00 - ....a.f.f.y.m.e
3a80 - 74 00 72 00 69 00 78 00 2d 00 61 00 72 00 72 00 - t.r.i.x.-.a.r.r
3a90 - 61 00 79 00 2d 00 62 00 61 00 72 00 63 00 6f 00 - a.y.-.b.a.r.c.o.
3aa0 - 64 00 65 00 00 00 64 00 40 00 35 00 32 00 30 00 - d.e...d.#.5.2.0.
3ab0 - 38 00 32 00 36 00 30 00 30 00 39 00 31 00 30 00 - 8.2.6.0.0.9.1.0.
3ac0 - 37 00 30 00 36 00 31 00 31 00 31 00 38 00 31 00 - 7.0.6.1.1.1.8.1.
3ad0 - 31 00 34 00 31 00 32 00 31 00 33 00 34 00 35 00 - 1.4.1.2.1.3.4.5.
3ae0 - 35 00 30 00 39 00 38 00 39 00 00 00 00 00 00 00 - 5.0.9.8.9.......
3af0 - 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 - ................
3b00 - 00 00 00 00 00 00 00 00 00 00 00 00 00 00 0a 00 - ................
and this is the output I'd like to get:
text/ascii affymetrix-array-barcode d#52082600910706111811412134550989
Try with the iconv command. Something like this should work:
tail -c +6 input.txt | iconv -f UTF16 -t ASCII >output.txt
Then split on the null bytes.
Granted, I'm no wiz, but this does the job if all your files look very similar to the one you just posted:
use strict;
open FILE, 'file.dat';
binmode FILE;
my ($chunk, $buf, $n);
seek FILE, 28, 0;
while (($n=read FILE, $chunk, 16)) { $buf .= $chunk; }
my #s=split(/\0\0/, $buf, 4);
print "$s[0] $s[1] $s[2]\n";
close (FILE);
A perl solution might be interesting, but wouldn't the unix strings command give you the plaintext portion of the file?

How to convert/manipulate BINARY file to ASCII file?

I'm looking for a way to take the TEXT characters from a 4byte BINARY file to array or TEXT file,
Lets say my input file is:
00000000 2e 00 00 00 01 00 00 00 02 00 00 00 03 00 00 00 |................|
00000010 04 00 00 00 05 00 00 00 06 00 00 00 07 00 00 00 |................|
00000020 08 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
00000030 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 |................|
*
00000070 00 00 00 00 00 00 00 00 |........|
00000078
And my desired output is:
46,1,2,3,4,5,6,7,8,9,0,0...
The output can be a TEXT file or an array.
I notice that the pack/unpack functions may help here, but I couldn't figure how to use them properly,
An example would be nice.
Use unpack:
local $/;
#_=unpack("V*", <>);
gets you an array. So as an inefficient (don't try on huge files) example:
perl -e 'local$/;print join(",",map{sprintf("%d",$_)}unpack("V*",<>))' thebinaryfile
The answer is dependent on what you consider an ASCII character. Anything below 128 is technically an ASCII character, but I am assuming you mean characters you normally find in a text file. In that case, try this:
#!/usr/bin/perl
use strict;
use warnings;
use bytes;
$/ = \1024; #read 1k at a time
while (<>) {
for my $char (split //) {
my $ord = ord $char;
if ($char > 31 and $char < 127 or $char =~ /\r\n\t/) {
print "$ord,"
}
}
}
od -t d4 -v <filename>