I have a file with pattern like this,
30,402.660064697,196.744171143,18.5563354492,15.3047790527,2.16090902686,aeroplane
30,177.246170044,113.594314575,18.9164428711,16.5203704834,1.71010773629,aeroplane
30,392.224212646,226.437973022,26.2086791992,27.4663391113,0.782454758883,aeroplane
30,241.633453369,169.349304199,628.560913086,540.041259766,0.530623318627,aeroplane
30,529.454589844,322.412719727,24.9837646484,21.5563354492,0.503144180402,aeroplane
30,365.581298828,148.842697144,21.3596801758,16.3081970215,0.490551069379,sheep
30,436.230773926,272.073303223,17.6417236328,19.9946289062,0.483223423362,aeroplane
30,438.188201904,286.455200195,20.164855957,23.1041870117,0.224495329894,adog
30,511.185546875,289.902099609,19.7315673828,19.3796386719,0.203064805828,aeroplane
30,365.777252197,177.576202393,21.8588256836,15.1581115723,0.181338354014,cat
30,380.210266113,150.396713257,19.6742553711,15.7977600098,0.171210919507,aeroplane
and another file contain the meta data
dog
aeroplane
cat
sheep
now I want to map the string(ex:aeroplane to int)
dog -->1
aeroplane -->2
cat -->3
sheep -->4
I know I can open a new file and convert it line by line by fgets
but in my case that pattern has n of thousands which will be very slow.
Is there any smarter way that I can solve this problem and dont need to create a new file and just update in same file
Assuming the file is not too large to read into memory, I suggest using textscan and strcomp and fprintf. You can overwrite the existing file if you really want, but I would recommend writing to a different file (especially when debugging).
Related
I have the following type of data:
`org.apache.spark.rdd.RDD[org.apache.spark.rdd.RDD[((String, String),Int)]] = MapPartitionsRDD[29] at map at <console>:38`
I'd like to write those data in a txt file to have something like
((like,chicken),2) ((like,dog),3) etc.
I store the data in a variable called res
But for the moment I tried with this:
res.coalesce(1).saveAsTextFile("newfile.txt")
But it doesn't seem to work...
If my assumption is correct, then you feel that the output should be a single .txt file if it was coalesced down to one worker. This is not how Spark is built. It is meant for distributed work and should not be attempted to be shoe-horned into a form where the output is not distributed. You should use a more generic command line tool for that.
All that said, you should see a folder named newfile.txt which contains data files with your expected output.
I want to write multiple ArrayBuffers to file one after the other in append mode in Scala and then I should be able to read the last ArrayBuffer from the file and delete it from the file and save the file with the remaining ArrayBuffer.
I cant think of any good solution for this. How should I do this?
Is there any way to create an empty .mat file from a terminal session? Basically, what I am doing is brain graph analysis. The software I am using, if an entire brain is scrubbed (ie, if the displacement of the brain is greater than a certain threshold) the output file will be left out or will be very small. When analyzing, however, I need to be able to eliminate both subjects from the analysis if the entire brain is scrubbed/too much of the brain is scrubbed. To accomplish this, the easiest way would be to simply check the dimensions of the output file within matlab, and if they are below the arbitrary threshold I decide then both subjects will just be skipped over for analysis. The issue is, I can easily check if a file contains too few remaining frames, however, if the resulting file contains no frames, it will entirely just not exist. As the outputs are all sorted, the only thing I need to do is check consecutive files' dimensions, and if one of the files does not contain enough values, then I can simply skip over it entirely. Simply touching a blank file obviously will not work, since it will not contain any encoding. I hope this is a good explanation for my motivation to do this, and if any of you know of any suggestions, please let me know.
A simple solution would be to create an empty file from Matlab and duplicate the file when needed from the console.
Just open Matlab, set to the destination folder and type this:
clear all
save empty.mat
Then, when needed, copy the file from the console. :)
Saving the contents of an empty struct creates an empty .mat file:
emptyStruct = struct;
save('myFile.mat','-struct','emptyStruct');
I'm having trouble with loading .txt file in Matlab. The main problem is having not equal rows. I'll attach the file so you can more clearly see what I'm truing to say. First, the file has information about each node in graph. One row has information like this:
1|1|EL_1_BaDfG|4,41|5,1|6,99|8,76|9,27|13,88|14,19|15,91|19,4|21,48...
it means:
id|type|name|connected_to, weight|connected_to, weight| and so on..
I was trying to use fscanf function, but it only reads whole line as one string. How I suppose to divide it into struct with information that I need?
Best regards,
Dejan
Here, you can see file that I'm trying to load
An alternative to Stewie answer is to use:
fgetl to read each line
Then use
strread (or textscan) to split the string
Firstly using the | delimiter - then on the sub section(s) containing , do it a second time.
I am new to MATLAB programming and some of the syntax escapes me. So I need a little help. Plus I need some complex looping ideas.
Here's the breakdown of what I have:
12 seperate .dat files, each titled something like output_1_x.dat, output_2_x.dat, etc.
each file is actually one piece of a whole that was seperated and processed
each .dat file is approx. 3.9 GB
Here's what I need to do:
create a single file containing all the data from each seperate file, i.e. I need to recreate the original file.
call this complete output file something like output_final.dat
it has to be done in MATLAB, there are no other alternatives (actually there maybe; see note below)
What is implied:
I will have to fread each 3.9 GBfile into chunks or packets, probably 100 mb at a time (using an imbedded loop?)
these packets will have to be read then written sequentially
after one file is read then written into output_final.dat, the next file is automatically read & written (the master loop).
Well, that's pretty much it. I did a search for 'merging mulitple files' and found this. That isn't exactly what I need to do...I don't need to take part of a file, or data from files, and write it to a new one. I'm simply...concatenating...? This would be simple in Java or Perl, but I only have MATLAB as a tool.
Note: I am however running KDE in OpenSUSE on a pretty powerful box. Maybe someone who is also an expert in terminal knows a command/script to do this from the kernel?
So on this site we usually would point you to whathaveyoutried.com but this question is well phrased.
I wont write the code but i will give you how I would do it. So first I am a bit confused about why you need to fread the file. Are you just appending one file onto the end of another?
You can actually use unix commands to achieve what you want:
files = dir('*.dat');
for i = 1:length(files)
string = sprintf('cat %s >> output_final.dat.temp', files(i).name);
unix(string);
end
That code should loop through all the files and pipe all of the content into output_final.dat.temp (then just rename it, we didn't want it to be included in anything);
But if you really want to use fread because you want to parse the lines in some manner then you can use the same process:
files = dir('*.dat');
fidF = fopen('output_final.dat', 'w');
for i = 1:length(files)
fid = fopen(files(i).name);
while(~feof(fid))
string = fgetl(fid) %You may choose to parse the string in some manner here
fprintf(fidF, '%s', string)
end
end
Just remember, if you are not parsing the lines this will take much much longer.
Hope this helps.
I suggest using a matlab.io.matfileclass objects on two of the files:
matObj1 = matfile('datafile1.mat')
matObj2 = matfile('datafile2.mat')
This does not load any data into memory. Then you can use the objects' methods to sequentialy save a variable from one file to another.
matObj1.varName = matObj2.varName
You can get all the variables in one file with fieldnames(mathObj1) and loop through to copy contents from one file to another. You can then clear some space by removing the copied fields. Or you can use a bit more risky procedure by directly moving the data:
matObj1.varName = rmfield(matObj2,'varName')
Just a disclaimer: haven't tried it, use at own risk.