Char (non ascii) in Matlab - matlab

I have three characters (bigger than 127) and I need to write it in a binary file.
For some reason, MATLAB and PHP/Python tends to write a different characters.
For Python, I have:
s = chr(143)+chr(136);
f = open('pythonOut.txt', 'wb');
f.write(s);
f.close();
For MATLAB, I have:
s = strcat(char(143),char(136));
fid = fopen('matlabOut.txt');
fwrite(fid, s, 'char');
fclose(fid);
When I compare these two files, they're different. (using diff and/or cmp command).
More over, when I do
disp(char(hex2dec('88'))) //MATLAB prints
print chr(0x88) //PYTHON prints ˆ
Both outputs are different. I want to make my MATLAB code same as Python. What's wrong with MATLAB code?

You're trying to display extended ASCII characters, i.e symbols with an ASCII number greater than 128. MATLAB does not use extended ASCII internally, it uses 16-bit Unicode instead.
If you want to write the same values as in the Python script, use native2unicode to obtain the characters you want. For example, native2unicode(136) returns ^.

The fact that the two files are different seems obvious; chr(134) is obviously different from char(136) :)
In Matlab, only the first 127 characters correspond to (non-extended) ASCII; anything after that is Unicode16.
In Python, the first 255 characters correspond to (extended) ASCII (use unichr() for Unicode).
However, unicode 0x88 is the same as the extended ASCII 0x88 (as are most of the others). The reason Matlab does not display it correctly, is due to the fact that the Matlab command window does not treat Unicode very well by default, while Python (running in a terminal or so I presume) usually does a better job.
Try changing the font in Matlab's command window, or starting Matlab in a terminal and print the 0x88 character; it should be the same.
In any case, the output of the characters to file should not result in any difference; it is just a display issue.

Related

Transform a matrix of integers (0 to 30) to a matrix of emojis

I am working on transforming an image into a set of emojis, depending on how many colors are there. The Maths part is done. I have the matrix of numbers from 0 to 30, but I specifically need to convert the numbers into symbols and I was thinking about emojis since they are so used nowadays.
My question is how am I supposed to read a matrix of integers from a file, transform the matrix of integers into a matrix with different emojis (eventually, from a list of my choice) and put the output in another text file, this time containing the emojis? Is that possible? I guess it should be, but how do I do that? Does anyone have any suggestions?
The problem that I face is actually with the emojis unicode, I don't seem to have success when it comes to receiving messages on the console in their case. I just get "? ?" instead of a smiley face. But that thing happens only for them, the ASCII characters seem to work a bit better. The problem with ASCII characters is that I need, again, expressive images instead of numbers or random pipe shapes.
There is the code:
%make sure you have the "1234567.jpg" in the same location as the .m file
imdata = imread('1234567.jpg');
[X_no_dither,map] = rgb2ind(imdata,30,'nodither');
imshow(X_no_dither,map)
% and there I try to put the output in a text file
dlmwrite('result.txt',X_no_dither,'delimiter',"\t");
Ok, and the output in the text file is:
0 0 0 0 26 26 ... etc.
And I wonder how am I supposed to write the code in such a way that I will get emojis instead of numbers.
🤔 🤔 🤔 🤔 💖 💖 ... etc.
That's how I'd want the output to be like. But, from what I tried yesterday, I cannot print them without getting warnings/errors.
What you need to do is create a table with your 30 emojis (this documentation page might be helpful), then index into that table. I'm using the compose function as indicated in the page above, it should also be possible to copy-paste emojis into your M-file. If you don't see the emojis in MATLAB's console, change the font you're using.
For example:
table = [compose("\xD83D\xDE0E"),
"B", % find the utf16 encoding for your emojis, or copy-paste them in
"C",
"D",
...
];
output = table(X_no_dither + 1);
f = fopen('result.txt', 'wt');
for ii = 1:size(output, 1)
fprintf(f, '%s', output(ii, :));
fprintf(f, '\n');
end
fclose(f);
This will write the file out in UTF16 format, which is what MATLAB uses. If you're on Windows this might work well for you. On other platforms you might want to save as UTF8 instead, which can be accomplished by opening the file in UTF8 mode:
f = fopen('result.txt', 'wt', 'native', 'UTF-8');
Note that, even if you don't manage to get the emojis shown in the MATLAB command window, opening the text file in an editor will show the emojis correctly.

Loading huge binary file partially into Matlab

I have a huge binary file with double precision numbers and I would like to load parts of it into Matlab. Is there a way to do this?
One way would be if I could convert it to a .mat file (without loading it in Matlab first), but I haven't been able to figure out how (or if it's actually possible).
Any ideas?
PS: I was thinking of using c++ to do the conversion but it turns out this is really problematic because I'm using a linux version of c++ (through cygwin) and a windows version of Matlab.
If you know what parts of the file you want to load, you can use fseek followed by fread (both preceded by fopen, of course).
For example, jump a few thousand bytes into a file and read a certain number of bytes, as doubles:
fid = fopen('binary.dat','r');
fseek(fid, 3000, 'bof');
A = fread(fid, N, 'double');
fclose(A); % don't forget to close the file
See the section of documentation called Reading Portions of a File for more information.

Memory map file in MATLAB?

I have decided to use memmapfile because my data (typically 30Gb to 60Gb) is too big to fit in a computer's memory.
My data files consist two columns of data that correspond to the outputs of two sensors and I have them in both .bin and .txt formats.
m=memmapfile('G:\E-Stress Research\Data\2013-12-18\LD101_3\EPS/LD101_3.bin','format','int32')
m.data(1)
I used the above code to memory map my data to a variable "m" but I have no idea what data format to use (int8', 'int16', 'int32', 'int64','uint8', 'uint16', 'uint32', 'uint64', 'single', and 'double').
In fact I tried all of the data formats listed that MATLAB supports, but when I used the m.data(index number) I never get a pair of numbers (2 columns of data) which is what I expected, also the number will be different depending on the format I used.
If anyone has experience with memmapfile please help me.
Here are some smaller versions of my data files so people can understand how my data is structured:
cheers
James
memmapfile is designed for reading binary files, that's why you are having trouble with your text file. The data in there is characters, so you'll have to read them as characters and then parse them into numbers. More on that below.
The binary file appears to contain more than just a stream of floating point values written in binary format. I see identifiers (strings) and other things in the file as well. Your only hope of reading that is to contact the manufacturer of the device that created the binary file and ask them about how to read in such files. There'll probably be an SDK, or at least a description of the format. You might want to look into this as the floating point numbers in your text file might be truncated, i.e., you have lost precision compared to directly reading the binary representation of the floats.
Ok, so how to read your file with memmapfile? This post provides some hints.
So first we open your file as 'uint8' (note there is no 'char' option, so as a workaround we read the content of the file into a datatype of the same size):
m = memmapfile('RTL5_57.txt','Format','uint8'); % uint8 is default, you could leave that off
We can render the data read in as uint8 as characters by casting it to char:
c = char(m.Data(1:19)).' % read the first three lines. NB: transpose just for getting nice output, don't use it in your code
c =
0.398516 0.063440
0.399611 0.063284
0.398985 0.061253
As each line in your file has the same length (2*8 chars for the numbers, 1 tab and 2 chars for newline = 19 chars), we can read N lines from the file by reading N*19 values. So m.Data(1:19) gets you the first line, m.Data(20:38), the second line, and m.Data(20:57) the second and third lines. Read as much as you want at once.
Then we'll have to parse the read-in data into floating point numbers:
f = sscanf(c,'%f')
f =
0.3985
0.0634
0.3996
0.0633
0.3990
0.0613
All that's left now is to reshape them into your two column format
d = reshape(f,2,[]).'
d =
0.3985 0.0634
0.3996 0.0633
0.3990 0.0613
Easier ways than using memmapfile:
You don't need to use memmapfile to solve your problem, and I think it makes things more complicated. You can simply use fopen followed by fread:
fid = fopen('RTL5_57.txt');
c = fread(fid,Nlines*19,'*char');
% now sscanf and reshape as above
% NB: one can read the values the text file directly with f = fscanf(fid,'%f',Nlines*19).
% However, in testing, I have found calling fread followed by sscanf to be faster
% which will make a significant difference when reading such large files.
Using this you can read Nlines pairs of values at a time, process them and simply call fread again to read the next Nlines. fread remembers where it is in the file (as does fscanf), so simply use same call to get next lines. Its thus easy to write a loop to process the whole file, testing with feof(fid) if you are at the end of the file.
An even easier way is suggested here: use textscan. To slightly adapt their example code:
Nlines = 10000;
% describe the format of the data
% for more information, see the textscan reference page
format = '%f\t%f';
fid = fopen('RTL5_57.txt');
while ~feof(fid)
C = textscan(fid, format, Nlines, 'CollectOutput', true);
d = C{1}; % immediately clear C at this point if you need the memory!
% process d
end
fclose(fid);
Note again however that the fread followed by sscanf will be fastest. Note however that the fread method would die as soon as there is one line in the text file that doesn't exactly match your format. textscan is forgiving of whitespace changes on the other hand and thus more robust.

Reading 64-bit integer from binary file in Matlab does not give the correct result

Edit: Problem was that even specifying the precision of the inputs matlab still converts them to doubles unless you specify otherwise. My mistake.
Reading a simple 64 bit integer into matlab appears to be giving a different value than if I do the conversion in python or windows calculator.
I have a small file, 8 bytes long whose contents are
0x99, 0x1e, 0x6b, 0x40, 0x27, 0xe3, 0x01, 0x56
I use the following in matlab:
fid = fopen('test.data')
input = fread(fid, 1, 'int64')
I get
input = 6197484319962505200
However, using either python or the windows calculator I get a different decimal representation for 0x5601e327406b1e99. Both predict I should get
input = 6197484319962504857 (which is different by 343). Its obviously not an endianness issue as then it would be WAY off.
I originally was led to test this because reading doubles from a large binary file was giving odd results. I then tried just reading them in as integers and comparing by hand.
My question is, am I doing something wrong, is there something I am overlooking, or is matlab making an error here? I am using win64 matlab R2010a.
The problem seems to be that fread is actually reading it in as a double:
>> class(input)
ans =
double
Because the number is so large, that's presumably the closest double value.
It works for me if I manually specify that the matlab variable should be int64, in addition to specifying the type in the source file (see the documentation for fread):
>> input = fread(fid, 1, '*int64')
input =
6197484319962504857

How can I control the formatting when saving a matrix to a file?

I save a matrix to a file like this:
save(filepath, 'mtrx', '-ascii');
Is there a way to tell MATLAB to write 0 instead of 0.0000000e+000 values? It would be nice because it would be faster and easier to see which values differ from zero.
I suggest using DLMWRITE instead of SAVE since you're dealing with ASCII files. It will give you more control over the formatting. For example, you could create an output file delimited by spaces with a field width of 10 and 6 digits after the decimal point (see more about format specifiers here):
dlmwrite(filepath,mtrx,'delimiter',' ','precision','%10.6g');