Different value when using fprintf or sprintf - matlab

I've written a function (my first, so don't be too quick to judge) in MATLAB, which is supposed to write a batch file based on 3 input parameters:
write_BatchFile(setup,engine,np)
Here setup consists of one or more strings, engine consists of one string only and np is a number, e.g.:
setup = {'test1.run';'test2.run';'test3.run'};
engine = 'Engine.exe';
np = 4; % number of processors/cores
I'll leave out the first part of my script, which is a bit more extensive, but in case necessary I can provide the entire script afterwards. Anyhow, once all 3 parameters have been determined, which it does successfully, I wrote the following, which is the last part of my script:
%==========================================================================
% Start writing the batch file
%==========================================================================
tmpstr = sprintf('\nWriting batch file batchRunMPI.bat...');
disp(tmpstr); clear tmpstr;
filename = 'batchRunMPI.bat';
fid = fopen(filename,'w');
fprintf(fid,'set OMP_NUM_THREADS=1\n');
for i = 1:length(setup);
fprintf(fid,'mpiexec -n %d -localonly "%s" "%s"\n',np,engine,setup{i});
fprintf(fid,'move %s.log %s.MPI_%d.log\n',setupname{i},setupname{i},np);
end
fclose all;
disp('Done!');
NOTE setupname follows using fileparts:
[~,setupname,setupext] = fileparts(setup);
However, when looking at the resulting batch file I end up getting the value 52 where I indicate my number of cores (= 4), e.g.:
mpiexec -n 52 -localonly "Engine.exe" "test1.run"
mpiexec -n 52 -localonly "Engine.exe" "test2.run"
mpiexec -n 52 -localonly "Engine.exe" "test3.run"
Instead, I'd want the result to be:
mpiexec -n 4 -localonly "Engine.exe" "test3.run", etc
When I check the value of np it returns 4, so I'm confused where this 52 comes from.
My feeling is that it's a very simple solution which I'm just unaware of, but I haven't been able to find anything on this so far, which is why I'm posting here. All help is appreciated!
-Daniel

It seems that at some stage np is being converted to a string. The character '4' has the integer value 52, which explains what you're getting. You've got a few options:
a) Figure out where np is being converted to a string and change it
b) the %d to a %s, so you get '4' instead of 52
c) change the np part of the printf statement to str2double(np).

Related

Find a string for which hash() starts with 0000

I've got a task from my professor and unfortunately I'm really confused.
The task:
Find a string D1 for which hash(D1) contains 4 first bytes equal 0.
So it should look like "0000....."
As I know we cannot just decrypt a hash, and checking them one by one is kind of pointless work.
I've got a task from my professor...
Find a string D1 for which hash(D1) contains 4 first bytes equal 0. So it should look like "0000....."
As I know we cannot just decrypt a hash, and checking them one by one is kind of pointless work.
In this case it seem like the work is not really "pointless." Rather, you are doing this work because your professor asked you to do it.
Some commenters have mentioned that you could look at the bitcoin blockchain as a source of hashes, but this will only work if your hash of interest is the same one use by bitcoin (double-SHA256!)
The easiest way to figure this out in general is just to brute force it:
Pseudo-code a la python
for x in range(10*2**32): # Any number bigger than about 4 billion should work
x_str = str(x) # Any old method to generate some bytes to hash should work
x_bytes = x_str.encode('utf-8')
hash_bytes = hash(x_bytes) # assuming hash() returns bytes
if hash_bytes[0:4] == b'\x00\x00\x00\x00':
print("Found string: {}".format(x_str))
break
I wrote a short python3 script, which repeatedly tries hashing random values until it finds a value whose SHA256 hash has four leading zero bytes:
import secrets
import hashlib
while(True):
p=secrets.token_bytes(64)
h=hashlib.sha256(p).hexdigest()
if(h[0:8]=='00000000'): break
print('SHA256(' + p.hex() + ')=' + h)
After running for a few minutes (on my ancient Dell laptop), it found a value whose SHA256 hash has four leading zero bytes:
SHA256(21368dc16afcb779fdd9afd57168b660b4ed786872ad55cb8355bdeb4ae3b8c9891606dc35d9f17c44219d8ea778d1ee3590b3eb3938a774b2cadc558bdfc8d4)=000000007b3038e968377f887a043c7dc216961c22f8776bbf66599acd78abf6
The following command-line command verifies this result:
echo -n '21368dc16afcb779fdd9afd57168b660b4ed786872ad55cb8355bdeb4ae3b8c9891606dc35d9f17c44219d8ea778d1ee3590b3eb3938a774b2cadc558bdfc8d4' | xxd -r -p | sha256sum
As expected, this produces:
000000007b3038e968377f887a043c7dc216961c22f8776bbf66599acd78abf6
Edit 5/8/21
Optimized version of the script, based on my conversation with kelalaka in the comments below.
import secrets
import hashlib
N=0
p=secrets.token_bytes(32)
while(True):
h=hashlib.sha256(p).digest()
N+=1
if(h.hex()[0:8]=='0'*8): break
p=h
print('SHA256(' + p.hex() + ')=' + h.hex())
print('N=' + str(N))
Instead of generating a new random number in each iteration of the loop to use as the input to the hash function, this version of the script uses the output of the hash function from the previous iteration as the input to the hash function in the current iteration. On my system, this quadruples the number of iterations per second. It found a match in 1483279719 iterations in a little over 20 minutes:
$ time python3 findhash2.py
SHA256(69def040a417caa422dff20e544e0664cb501d48d50b32e189fba5c8fc2998e1)=00000000d0d49aaaf9f1e5865c8afc40aab36354bc51764ee2f3ba656bd7c187
N=1483279719
real 20m47.445s
user 20m46.126s
sys 0m0.088s
The sha256 hash of the string $Eo is 0000958bc4dc132ad12abd158073204d838c02b3d580a9947679a6
This was found using the code below which restricts the string to only UTF8 keyboard characters. It cycles through the hashes of each 1 character string (technically it hashes bytes, not strings), then each 2 character string, then each 3 character string, then each 4 character string (it never had to go to 4 characters, so I'm not 100% sure the math for that part of the function is correct).
The 'limit" value is included to prevent the code from running forever in case a match is not found. This ended up not being necessary as a match was found in 29970 iterations and the execution time was nearly instantaneous.
#!/usr/bin/env python
# -*- coding: utf-8 -*-
from hashlib import sha256
utf8_chars = list(range(0x21,0x7f))
def make_str(attempt):
if attempt < 94:
c0 = [attempt%94]
elif attempt >= 94 and attempt < 8836:
c2 = attempt//94
c1 = attempt%94
c0 = [c2,c1]
elif attempt >= 8836 and attempt < 830584:
c3 = attempt//8836
c2 = (attempt-8836*c3)//94
c1 = attempt%94
c0 = [c3,c2,c1]
elif attempt >= 830584 and attempt < 78074896:
c4 = attempt//830584
c3 = (attempt-830584*c4)//8836
c2 = ((attempt-830584*c4)-8836*c3)//94
c1 = attempt%94
c0 = [c4,c3,c2,c1]
return bytes([utf8_chars[i] for i in c0])
target = '0000'
limit = 1200000
attempt = 0
hash_value = sha256()
hash_value.update(make_str(attempt))
while hash_value.hexdigest()[0:4] != target and attempt <= limit:
hash_value = sha256()
attempt += 1
hash_value.update(make_str(attempt))
t = ''.join([chr(i) for i in make_str(attempt)])
print([t, attempt])

OCTAVE data import from PCE-VDL data logger device and conversion of decimal coma to decimal point

I have a measurement device PCE-VDL, which gives me measurements in following CSV format below, which I need to import to OCTAVE for further investigation.
Especially I need to import last 3 columns with xyz acceleration data.
The file is in CSV format with delimiter of semicolon ";".
I have tried:
A_1 = importdata ("file.csv", ";", 3);
but have recieved
error: missing_idx(10): out of bound 9
The CSV file looks like this:
#PCE-VDL X - TableView series
#2020.16.11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020.28.10;16:16:32:0000;00:000;;;;0,0195;-0,0547;1,0039;
2020.28.10;16:16:32:0052;00:005;;;;0,0898;-0,0273;0,8789;
2020.28.10;16:16:32:0104;00:010;;;;0,0977;-0,0313;0,9336;
2020.28.10;16:16:32:0157;00:015;;;;0,1016;-0,0273;0,9297;
The numbers in last 3 columns have also decimal coma and not decimal point. So there probably should be done also some conversion.
Thank you very much for any help.
Regards
EDIT: 18.11.2020
Thanks for help. I have tried now following:
A_1_str = fileread ("file.csv");
A_1_str_m = strrep (A_1_str, ".", "-");
A_1_str_m = strrep (A_1_str_m, ",", ".");
save "A_1_str_m.csv" A_1_str_m;
A_1 = importdata ("A_1_str_m.csv", ";", 8);
and still receive error: file_content(140): out of bound 139
There is probably some problem with time format in first columns, which I do not want to read. I just need last three columns.
After my conversion, the file looks like this:
# Created by Octave 5.1.0, Wed Nov 18 21:40:52 2020 CET <zdenek#ASUS-F5V>
# name: A_1_str_m
# type: sq_string
# elements: 1
# length: 7849
#PCE-VDL X - TableView series
#2020-16-11
#Date;Time;Duration [s];t [°C];RH [%];p [mbar];aX [g];aY [g];aZ [g];
2020-28-10;16:16:32:0000;00:000;;;;0.0195;-0.0547;1.0039;
2020-28-10;16:16:32:0052;00:005;;;;0.0898;-0.0273;0.8789;
2020-28-10;16:16:32:0104;00:010;;;;0.0977;-0.0313;0.9336;
Thanks for support!
You can first read the data with fileread, which stores the data as a string. Then you can manipulate the string like this:
new_string = strrep(string, ",", ".");
strrep replaces all occurrences of a pattern within a string. Afterwards you save this data as a separate file or you overwrite the existing file with the manipulated data. When this is done you proceed as you have tried before.
EDIT: 19.11.2020
To avoid the additional heading lines in the new file, you can save it like this:
fid = fopen("A_1_str_m.csv", "w");
fputs(fid, A_1_str_m);
fclose(fid);
fputs will just write the string to the file.
The you can read the new file with dlmread.
A1_buf = dlmread("A_1_str_m.csv", ";");
A1_buf = real(A1); # get the real value of the complex number
A1_buf(1:3, :) = []; # remove the headlines
A1 = A1_buf(:, end-3:end-1); # get only the the 3 columns you're looking for
This will give you the three columns your looking for. But the date and time data will be ignored.
EDIT 20.11.2020
Replaced abs with real, so the sign of the value will be kept.
Use csv2cell from the io package.

Slow regexprep with a very long string

I have simulation data in an ascii file with a lot of data points. I'm trying to extract variable names and their values from it. The below is an example of what the file format looks like:
*ESA
*COM on Tue Sep 27 15:23:02 2016
*COM C:\Users\vi813c\Documents\My Matlab\
*COM The pathname to the ESB file was: C:\Users\vi813c\Documents\My Matlab
Case013
*RTITLE
Run Date/Time = 20-SEP-2016 13:29:00
MSC.EASY5 time-history plot with 20001 data points
*EOD
*FLOAT
TIME FDLB(1) FSLB(1) FVLB(1) MXLB(1) \
MYLB(1) MZLB(1) FDLB(2) FSLB(2) FVLB(2) \
MXLB(2) MYLB(2) MZLB(2) FDLB(3) FSLB(3) \
FVLB(3) MXLB(3) MYLB(3) MZLB(3)
0 884.439 -0 53645.8 -972.132
-311780 207.866 5403.68 1981.49 327781
258746 -1.74898E+006 84631.4 5384.25 -1308.47
326538 -97028.6 -1.74013E+006 -61858.1
0.002 882.616 0.008033 53661.1 -972.4
-311702 207.779 5400.42 1982.11 327784
258726 -1.74906E+006 84628.3 5381.01 -1308.44
326541 -97040.1 -1.74021E+006 -61858.8
0.004 876.819 0.031336 53705.6 -973.183
-311683 207.661 5391.19 1983.9 327795
258693 -1.74935E+006 84624 5371.85 -1309.63
326552 -97040.6 -1.74051E+006 -61858.8
0.006 869.491 0.061631 53763.3 -974.213
-311806 207.618 5377.45 1986.76 327813
258659 -1.74995E+006 84621.7 5358.2 -1312.04
326569 -97040.3 -1.7411E+006 -61861
0.008 861.718 0.095625 53828.1 -975.379
-312039 207.648 5360.82 1990.12 327834
A summary of data format characteristics is as follows:
Everything above "*FLOAT" is a header and I need to get rid of it
Stuff between "*FLOAT" and the first numeric value are the variable names
The variable names and the values are delimited by space(s) and '\'
The data are "lumped". Each lump has values for the variables at a given simulation time step. In the example above, there are 19 variables so that there are 19 numeric values in each lump
There can be multiple data sets; each preceded with "*FLOAT" and a variable name section
The following is how I am currently handling this data:
fileread the file --> one big string of characters
regexprep {'\s+,'\','\n'} with ',' --> comma delimited for strsplit
strfind "*FLOAT"
strsplit by ',' --> now becomes a cell
find the first numeric value by isnan(str2double(parse))
Then between the index from 2. and the index from 4 are the variable names and between the index from 4 and the next "*FLOAT" are the numeric data
This scheme is sort of working, but I can't stop thinking that there's gotta be a better way to do this. For one, the step 1. is extremely slow. I guess it's one big string for regexprep to work on with multiple things to replace.
How can I improve my script?
I gave this a shot with the string class which is new in 16b.
str = string(fileread('file.txt'));
fileNewline = [13 newline]; % This data has carriage returns
str = extractAfter(str, ['*FLOAT' fileNewline]);
str = erase(str, ['\' fileNewline]);
str = splitlines(str);
% Get the variable names
varNames = split(str(1))';
% Get the data
data = reshape(str(2:end), 4, [])';
data = strip(data);
data = join(data);
data = split(data);
data = double(data);
I'm not sure about how to load the file faster.
As mentioned in another comment, textscan could probably help. It might end up being the fastest solution. With the correct format specified and using the 'HeaderLines' option, I think you can make it work.

How to read digits from file to matrix, no delimeter

I have a data stored in below format, no delimeter and digit domain is {0,1}. With using octave, taking the digits and storing them in martix is reaised a problem for me. I have not managed below scnerio. So, How can I take those digits and store them on matrix as told at below?
Data in File, 32 x 32 digits
00000000000000000000000000000000
00000000001111110000000000000000
...
00000010000000100001000000000000
how to store data
matrix[1, 1:32] = 00000000000000000000000000000000
matrix[2, 1:32] = 00000000001111110000000000000000
. . .
matrix[32, 1:32] = 00000010000000100001000000000000
OR
matrix[1, 1:32] = 00000000000000000000000000000000
matrix[1, 33:64] = 00000000001111110000000000000000
. . .
matrix[1, 993:1024] = 00000010000000100001000000000000
One possible solution is to read the data as a string first:
octave> textread('foo.dat', '%s', 'headerlines', 2)
ans =
{
[1,1] = 00000000000000000000000000000000
[2,1] = 00000000001111110000000000000000
...
}
If these are binary representations of decimals, you may find bin2dec() useful.
This would do the trick (though I don't know how well that third input to fread and arrayfun work with Octave, tested this on Matlab):
fid = fopen('a.txt','rt');
str = fread(fid,inf,'char=>char');
st = fclose(fid);
qrn = str==10|str==13;
str(qrn) = [];
yourMat = reshape(arrayfun(#str2num,str),find(qrn,1)-1,[]).'
Assuming you don't have header lines, you can read the text in as a cell arrray of strings like so:
C = textread('names.txt', '%s');
Then, in general for all numbers from 0 to 9, you can transform this into a matrix like so:
M = vertcat(S{:})-'0';
If performance is an issue you can look into other ways to import the strings, but this should get the job done.
I have never used Matlab, but asuming it reads files the same way Octave does, and if using an external tool is OK, you could try replacing the characters to add a delimiter using a text editor. You could change every "0" to "0," and every "1" to "1," and then simply load the file.
(This would add a delimiter at the end of every line. In case that creates a problem, you could try replacing your text by pairs instead "00"->"0,0" "10" -> "1,0" and so on)
In case the file is too big for a normal editor, you might even try replacing the characters with sed:
sed -i 's/charactertoreplace/newcharacter/g' yourfile.txt

Is there a way in Matlab to determine the number of lines in a file without looping through each line?

Obviously one could loop through a file using fgetl or similar function and increment a counter, but is there a way to determine the number of lines in a file without doing such a loop?
I like to use the following code for exactly this task
fid = fopen('someTextFile.txt', 'rb');
%# Get file size.
fseek(fid, 0, 'eof');
fileSize = ftell(fid);
frewind(fid);
%# Read the whole file.
data = fread(fid, fileSize, 'uint8');
%# Count number of line-feeds and increase by one.
numLines = sum(data == 10) + 1;
fclose(fid);
It is pretty fast if you have enough memory to read the whole file at once. It should work for both Windows- and Linux-style line endings.
Edit: I measured the performance of the answers provided so far. Here is the result for determining the number of lines of a text file containing 1 million double values (one value per line). Average of 10 tries.
Author Mean time +- standard deviation (s)
------------------------------------------------------
Rody Oldenhuis 0.3189 +- 0.0314
Edric (2) 0.3282 +- 0.0248
Mehrwolf 0.4075 +- 0.0178
Jonas 1.0813 +- 0.0665
Edric (1) 26.8825 +- 0.6790
So fastest are the approaches using Perl and reading all the file as binary data. I would not be surprised, if Perl internally also read large blocks of the file at once instead of looping through it line by line (just a guess, do not know anything about Perl).
Using a simple fgetl()-loop is by a factor of 25-75 slower than the other approaches.
Edit 2: Included Edric's 2nd approach, which is much faster and on-par with the Perl solution, I'd say.
I think a loop is in fact the best - all other options so far suggested either rely on external programs (need to error-check; need str2num; harder to debug / run cross-platform etc.) or read the whole file in one go. Loops aren't so bad. Here's my variant
function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(#() fclose(fh));
count = 0;
while ischar(fgetl(fh))
count = count + 1;
end
end
EDIT: Jonas rightly points out that the above loop is really slow. Here's a faster version.
function count = countLines(fname)
fh = fopen(fname, 'rt');
assert(fh ~= -1, 'Could not read: %s', fname);
x = onCleanup(#() fclose(fh));
count = 0;
while ~feof(fh)
count = count + sum( fread( fh, 16384, 'char' ) == char(10) );
end
end
It's still not as fast as wc -l, but it's not a disaster either.
I found a nice trick here:
if (isunix) %# Linux, mac
[status, result] = system( ['wc -l ', 'your_file'] );
numlines = str2num(result);
elseif (ispc) %# Windows
numlines = str2num( perl('countlines.pl', 'your_file') );
else
error('...');
end
where 'countlines.pl' is a perl script, containing
while (<>) {};
print $.,"\n";
You can read the entire file at once, and then count how many lines you've read.
fid = fopen('yourFile.ext');
allText = textscan(fid,'%s','delimiter','\n');
numberOfLines = length(allText{1});
fclose(fid)
I would recommend using an external tool for this. For example an app called cloc, which you can download here for free.
On linux you then simply type cloc <repository path> and get
YourPC$ cloc <directory_path>
87 text files.
81 unique files.
23 files ignored.
http://cloc.sourceforge.net v 1.60 T=0.19 s (311.7 files/s, 51946.9 lines/s)
-------------------------------------------------------------------------------
Language files blank comment code
-------------------------------------------------------------------------------
MATLAB 59 1009 1074 4993
HTML 1 0 0 23
-------------------------------------------------------------------------------
SUM: 60 1009 1074 5016
-------------------------------------------------------------------------------
They also claim it should work on windows.