ISO-8859-1 encoding MATLAB - matlab

I have a problem with ISO encoding in MATLAB.
I have a logging file, with all possible values between 0..255 stored in binary format.
When I open this file in matlab and read one line, MATLAB shows me the correct representation in ISO-8859-1. So far, so good.
For example the value 155 (0x9B) shows the character ">". (Any small character values like this work). Matlab shows this correctly, but when I want process an integer value with double(>) the return value is 8250, which is not an ASCII-Value.
What can I change in the encoding of the file?
edit: the logfile was written with python, in case that matters.

I find the problem. I missed to set the encoding in the fopen command. Working Solution:
%creating testfile
ascii=char([191 210 191 212 191 228 192 215 192 144 198 175 155 236 254 201 10]); %problem value here the 155
logID=fopen('testdatei.log','w','n','ISO-8859-1');
fwrite(logID,ascii);
fclose(logID);
% wrong filehandling
logID=fopen('testdatei.log');
line=fgetl(logID);
decode=double(line);
disp('wrong encoding')
decode(13)
fclose(logID);
%right filehandling
logID=fopen('testdatei.log','r','n','ISO-8859-1');
line=fgetl(logID);
decode=double(line);
disp('right encoding')
decode(13)
fclose(logID);

Related

UTF-8 in decimal

Is representing UTF-8 encoding in decimals even possible? I think only values till 255 would be correct, am I right?
As far as I know, we can only represent UTF-8 in hex or binary form.
I think it is possible. Let's look at an example:
The Unicode code point for ∫ is U+222B.
Its UTF-8 encoding is E2 88 AB, in hexadecimal representation. In octal, this would be 342 210 253. In decimal, it would be 226 136 171. That is, if you represent each byte separately.
If you look at the same 3 bytes as a single number, you have E288AB in hexadecimal; 70504253 in octal; and 14846123 in decimal.

How can I read a text file containing numbers in MATLAB?

I have to read different numbers in the same line in a text file. How can I pass them to an Array (for each line), if I don't know how many numbers I have to read?
I thought about reading each number and passing it to an array, until I find the New Line character. But I have a lot of files, so doing this takes a lot of time.
With this arrays from each file I have to build plots. Is there any other way?
12 43 54 667 1 2 3 1 545 434 6 476
14 32 45 344 54 54 10 32 43 5 6 66
Thanks
You can open each file and read it line by line, then use textscan(str,'%d') to convert each line into an array.
Example for one file:
fid = fopen('file.txt');
tline = fgetl(fid);
while ischar(tline)
C = textscan(str,'%d');
celldisp(C);
tline = fgetl(fid);
end
fclose(fid);
You would have to run the code for each file, and do something with the array C.
You can read the additional details on the function textscan.
The way to read ASCII-delimited, numerical data in MATLAB is to use dlmread, as already suggested by #BillBokeey in a comment. This is as simple as
C = dlmread('file.txt');

Alternative to dec2hex in MATLAB?

I am using dec2hex up to 100 times in MATLAB. Because of this, the speed of code decreases. for one point I am using dec2hex 100 times. It will take 1 minute or more than it. I have do the same for 5000 points. But because of dec2hex it will take hours of time to run. So how can I do hexadecimal to decimal conversion optimally? Is there any other alternative that can be used instead of dec2hex?
As example:
%%Data[1..256]: can be any data from
for i=1:1:256
Table=dec2hex(Data);
%%Some permutation applied on Data
end;
Here I am using dec2hex more than 100 times for one point. And I have to use it for 5000 points.
Data =
Columns 1 through 16
105 232 98 250 234 216 98 199 172 226 250 215 188 11 52 174
Columns 17 through 32
111 181 71 254 133 171 94 91 194 136 249 168 177 202 109 187
Columns 33 through 48
232 249 191 60 230 67 183 122 164 163 91 24 145 124 200 142
This kind of data My code will use.
Function calls are (still) expensive in MATLAB. This is one of the reasons why vectorization and pseudo-vectorization is strongly recommended: processing an entire array of N values in one function call is way better than calling the processing function N times for each element, thus saving the N-1 supplemental calls overhead.
So, what you can do? Here are some non-mutually-exclusive choices:
Profile your code first. Just because something looks like the main culprit for execution time disasters, it isn't necessarily it. Type profview in your command window, chose the script that you want to run, and see where are the hotspots of your code. Choose to optimize those hotspots rather than your initial guesses.
Try faster functions. sprintf is usually fast and flexible:
Table = sprintf('%04X\n', Data);
(and — if you dive into the function code with edit dec2hex — you'll see that in some cases dec2hex actually calls sprintf).
Reduce the number of function calls. Suppose you have to build the table for the 100 datasets of different lengths, that are stored in a cell array:
DataSet = cell(1,100);
for k = 1:100
DataSet{k} = fix(1000*rand(k,1));
end;
The idea is to assemble all the numbers in a single array that you convert at once:
Table = dec2hex(vertcat(DataSet{:}));
Mind you, this is done at the expense of using supplemental memory for assembling the partial inputs in a single one — it's not always convenient to do that.
All the variants above. Okay, this point is not actually a point. :-)

Matlab reading hex values from text file with non hex values interspersed?

I have a text file that looks something like this what's pasted below. Several hex values followed by "xx" followed by hex values. The pattern repeats ~1M times. I'm looking for a good way to read out just the hex values ignoring the "xx" values. Textscan seems interesting, but doesn't support hex. fscanf is great, but it chokes as soon as it hits the first "xx" in the file. I wrote a clunky script, which reads everything as a string, omits "xx"s and uses dec2hex, but this is painfully slow (obviously). Any suggestions?
7F
55
8A
9B
6E
XX
XX
XX
XX
FF
DE
BE
EF
XX
XX
XX
04
88
.
.
.
This solution reads 1 million 2-character lines in less than a second on my laptop:
fid = fopen('test.txt');
A = textscan(fid,'%2c','CommentStyle','XX');
fclose(fid);
A = hex2dec(A{:});
Note the 'CommentStyle' option that skips those lines that start with XX.

What is the real purpose of Base64 encoding?

Why do we have Base64 encoding? I am a beginner and I really don't understand why would you obfuscate the bytes into something else (unless it is encryption). In one of the books I read Base64 encoding is useful when binary transmission is not possible. Eg. When we post a form it is encoded. But why do we convert bytes into letters? Couldn't we just convert bytes into string format with a space in between? For example, 00000001 00000004? Or simply 0000000100000004 without any space because bytes always come in pair of 8?
Base64 is a way to encode binary data into an ASCII character set known to pretty much every computer system, in order to transmit the data without loss or modification of the contents itself.
For example, mail systems cannot deal with binary data because they expect ASCII (textual) data. So if you want to transfer an image or another file, it will get corrupted because of the way it deals with the data.
Note: base64 encoding is NOT a way of encrypting, nor a way of compacting data. In fact a base64 encoded piece of data is 1.333… times bigger than the original datapiece. It is only a way to be sure that no data is lost or modified during the transfer.
Base64 is a mechanism to enable representing and transferring binary data over mediums that allow only printable characters.It is most popular form of the “Base Encoding”, the others known in use being Base16 and Base32.
The need for Base64 arose from the need to attach binary content to emails like images, videos or arbitrary binary content . Since SMTP [RFC 5321] only allowed 7-bit US-ASCII characters within the messages, there was a need to represent these binary octet streams using the seven bit ASCII characters...
Hope this answers the Question
Base64 is a more or less compact way of transmitting (encoding, in fact, but with goal of transmitting) any kind of binary data.
See http://en.wikipedia.org/wiki/Base64
"The general rule is to choose a set of 64 characters that is both part of a subset common to most encodings, and also printable."
That's a very general purpose and the common need is not to waste more space than needed.
Historically, it's based on the fact that there is a common subset of (almost) all encodings used to store chars into bytes and that a lot of the 2^8 possible bytes risk loss or transformations during simple data transfer (for example a copy-paste-emailsend-emailreceive-copy-paste sequence).
(please redirect upvote to Brian's comment, I just make it more complete and hopefully more clear).
For data transmission, data can be textual or non-text(binary) like image, video, file etc.
As we know, during transmission only a stream of data(textual/printable characters) can be sent or received, hence we need a way encode non-text data like image, video, file.
Binary and ASCII representation of non-text(image, video, file) is easily obtainable.
Such non-text(binary) represenation is encoded in textual format such that each ASCII character takes one out of sixty four(A-Z, a-z, 0-9, + and /) possible character set.
Table 1: The Base 64 Alphabet
Value Encoding Value Encoding Value Encoding Value Encoding
0 A 17 R 34 i 51 z
1 B 18 S 35 j 52 0
2 C 19 T 36 k 53 1
3 D 20 U 37 l 54 2
4 E 21 V 38 m 55 3
5 F 22 W 39 n 56 4
6 G 23 X 40 o 57 5
7 H 24 Y 41 p 58 6
8 I 25 Z 42 q 59 7
9 J 26 a 43 r 60 8
10 K 27 b 44 s 61 9
11 L 28 c 45 t 62 +
12 M 29 d 46 u 63 /
13 N 30 e 47 v
14 O 31 f 48 w (pad) =
15 P 32 g 49 x
16 Q 33 h 50 y
These sixty four character set is called Base64 and encoding a given data into this character set having sixty four allowed characters is called Base64 encoding.
Let us take examples of few ASCII characters when encoded to Base64.
1 ==> MQ==
12 ==> MTI=
123 ==> MTIz
1234 ==> MTIzNA==
12345 ==> MTIzNDU=
123456 ==> MTIzNDU2
Here few points are to be noted:
Base64 encoding occurs in size of 4 characters. Because an ASCII character can take any out of 256 characters, which needs 4 characters of Base64 to cover. If the given ASCII value is represented in lesser character then rest of characters are padded with =.
= is not part of base64 character set. It is used for just padding.
Hence, one can see that the Base64 encoding is not encryption but just a way to transform any given data into a stream of printable characters which can be transmitted over network.