How to write "Big Data" to a text file using Matlab - matlab

I am getting some readings off an accelerometer connected to an Arduino which is in turn connected to MATLAB through serial communication. I would like to write the readings into a text file. A 10 second reading will write around 1000 entries that make the text file size around 1 kbyte.
I will be using the following code:
%%%%%// Communication %%%%%
arduino=serial('COM6','BaudRate',9600);
fopen(arduino);
fileID = fopen('Readings.txt','w');
%%%%%// Reading from Serial %%%%%
for i=1:Samples
scan = fscanf(arduino,'%f');
if isfloat(scan),
vib = [vib;scan];
fprintf(fileID,'%0.3f\r\n',scan);
end
end
Any suggestions on improving this code ? Will this have a time or Size limit? This code is to be run for 3 days.

Do not use text files, use binary files. 42718123229.123123 is 18 bytes in ASCII, 4 bytes in a binary file. Don't waste space unnecessarily. If your data is going to be used later in MATLAB, then I just suggest you save in .mat files
Do not use a single file! Choose a reasonable file size (e.g. 100Mb) and make sure that when you get to that many amount of data you switch to another file. You could do this by e.g. saving a file per hour. This way you minimize the possible errors that may happen if the software crashes 2 minutes before finishing.

Now knowing the real dimensions of your problem, writing a text file is totally fine, nothing special is required to process such small data. But there is a problem with your code. You are writing a variable vid which increases over time. That may cause bad performance because you are not using preallocation and it may consume a lot of memory. I strongly recommend not to keep this variable, and if you need the dater read it afterwards.
Another thing you should consider is verification of your data. What do you do when you receive less samples than you expect? Include timestamps! Be aware that these timestamps are not precise because you add them afterwards, but it allows you to identify if just some random samples are missing (may be interpolated afterwards) or some consecutive series of maybe 100 samples is missing.

Related

serial input buffer size Matlab

I'm trying to read a lot of data coming from my Arduino, I've set my input buffer to 500000 to make sure that it can handle all these data. My data are 4 sensors readings each samples at 250 Hz. With the default buffer size (712), I used to get snags when I plot the readings in real time and the samples get disordered which makes the plot go crazy. I solved this by increasing the buffer size to 50000. But now, this will work for a while but if I want to run it for 15 minutes, I get the same misbehavior after 5 minutes, with the addition that the plotting gets slower. I do have some processing code along with the live plotting but it shouldn't be like this with such a bi buffer. I want to know whether the buffer will contain all the data from the beginning until it's full or will it keep erasing older data when it gets full (knowing that I already saved it in another vector and plotted it). I truly don't understand why this keeps happening.
kind regards
I.H
When the buffer gets full, once you get new data it erases the old data. The behavior you are seeing is because your processing and your plotting is slower than the flow of the data.
Try to make sure that you optimize you processing
Make sure that for plotting is done by "drawnow". Like this you are sure that if there is anything in the queue it is not executed
Try to avoid saving and keeping all the data
If the problem is still there, you can try to implement a timer to make sure that you are consistent with reading your data

Taking the left and right channels of two wav files and joining them into one separate stereo file

I have two stereo wav files that I would like to take the left channel of the first audio file and take the right channel of the second audio file and join them into one new wave file.
Here's an image of what I'm trying to do.
I know I can read files into matlab / octave and get the separate left right channels with the code below:
[imported_sig_1, fs_rate, nbitsraw] = wavread(strcat('/tmp/01a.wav'));
imported_sig_L=imported_sig_1(:,1)';
[imported_sig_2, fs_rate, nbitsraw] = wavread(strcat('/tmp/02a.wav'));
imported_sig_R=imported_sig_2(:,2)';
I can then write the new channels that I want out using the code
wavwrite([(imported_sig_L)' (imported_sig_R)'] ,fs_rate,16,'newfile.wav'); %
The problem I'm running into is the time it takes to import the file and size of the array the wave files take up. The files I'm importing are about 1-4 hours long and it takes a while to import and it takes a lot of memory in the array is there away around importing the full file and then exporting them?
I'm using octave 3.8.1 on Ubuntu 14.04 which is like matlab but I also have access to sox
I assume the bottleneck is your hard drive and your system has sufficient memory to keep all thee files in memory at the same time. If so you won't gain an speed reading only one channel. With a 16 bit wav your HDD would have to skip 2 bytes, read 2 bytes, skip 2 bytes, read 2 bytes... For such a read operation it is much faster to copy the full file into the memory and remove the unwanted channels afterwards.

using mapreduce programming technique in matlab

I am studying rat ultrasonic vocalisations (their speech in ultrasound). I have several audio wav files of the rats speeches. Ideally, I would import the whole file into matlab and just process it but I will get memory issues even with the smallest 70mb file. This is what I want help with.
[y, Fs, nbits] = audioread('T0000201.wav');
[S F T] = spectrogram(y,100,[],256,Fs,'yaxis');
..
..
..rest of program
I could consider breaking the audio (in one file) into blocks, and process the block before considering the next block, but I'm not sure what I would do for cases where rat calls are cut off half way through, at the end of the blocks (this might have a negative impact on the STFT spectrogram).
I came across another technique called "Mapreduce" which seems to allow me to use the entirety of my data without actually reading it in. While this seems most ideal, I don't quite understand how it works or can be implemented. "Hadoop" has also been mentioned. Can anyone provide any assistance?
I am currently using this (http://uk.mathworks.com/help/matlab/import_export/find-maximum-value-with-mapreduce.html) for reference. My first step was trying to use the wav file as the data store (like the csv file in the example) but that didn't work.
Since you're working primarily with a repository of audio (.wav) files, mapreduce might not be your best option. The datastore function only works with text files or key-value files.
Use the memory function to explore what the limits of memory are for MATLAB, and try processing the audio files in smaller blocks as you mentioned. Using a combination of audioread(), audioinfo(), and audiowrite(), you can break your collection of audio files up into a larger collection of smaller files that can then be individually processed.
If you have a small number of files to work with, then you can manually inspect the smaller blocks to make sure no important rat calls are cut off between blocks. Of course if you have thousands of files to work with then that approach won't be feasible.

Collect more than 10,000 data points from a Tektronix oscilloscope?

I am building a MATLAB GUI to do data collection from a Tektronix DPO4104 oscilloscope (MATLAB driver here).
I am playing around with tmtool and with my GUI code and have found that the driver can only collect 10,000 data points, regardless of if the oscilloscope is set to show more than 10k points. I found this post on in CCSM but it hasn't been terribly helpful. (I'm the last post on there if you care to read it.) I am using the DPO4104 driver, whereas this post discusses use of the DPO4100 driver, I believe.
As far as I can tell, the steps are:
Edit driver's readwaveform function to account for the current recordLength - in my case, 100,000 points, say.
Manually edit the driver's MaxNumberPoint from 10,000 to 100,000. (In my case, the default number was 0.. I changed this to 100,000).
Manually edit EndingPoint. I set this to 100,000 also.
Before creating a device object, set(interfaceObj, 'InputBufferLength', 2.5*recordLength), that is, make sure the input buffer can fit more than 100,000 points. It's recommended to use at least double the expected buffer. I used 2.5 just because.
Build device object and waveform object, connect() to it, and readwaveform. Profit.
I am still unable to collect more than 10,000 points, either through tmtool or through my GUI. Any help would be appreciated.
I got ahold of a Tektronix engineer; he basically told me just to use the SCPI commands directly and skip the driver. While annoying, this might be the simplest solution.
Is it possible for you to collect the data points 10,000 at a time, then save them somewhere, collect the next 10,000, append them to the saved points, repeat?
It's a work-around, sure.
I figured it out! I think. Taking a couple weeks to step back and refresh really helped. Here's what I did:
1) Edit the driver's init function to configure a larger buffer size. Complete init code:
function init(obj)
% This method is called after the object is created.
% OBJ is the device object.
% End of function definition - DO NOT EDIT
% Extract the interface object.
interface = get(obj, 'Interface');
fclose(interface);
% Configure the buffer size to allow for waveform transfer.
set(interface, 'InputBufferSize', 12e6);
set(interface, 'OutputBufferSize', 12e6); % Originally is set to 50,000
I originally tried to set the the buffer sizes to 22e6 (I wanted to get 10 million points) but I got out-of-memory errors. Supposedly the buffer should be a little more than double what you expect to get out, plus space for headers. I probably don't need 2 million points worth of "header", but eh.
2) Edit the driver's readwaveform() to first query what the user-settable number of points to collect should be. Then, write SCPI commands to the scope to ensure that the number of data points to be transferred is equal to the number of points the user desires. The following snippet will do the trick in readwaveform:
try
% Specify source
fprintf(interface,['DATA:SOURCE ' trueSource]);
%----------BEGIN CODE TO HANDLE MORE THAN 10k POINTS----------
recordLength = query(interface, 'HORizontal:RECordlength?');
fprintf(interface, 'DATA:START 1');
fprintf(interface, 'DATA:STOP %d', str2num(recordLength));
%----------END CODE TO HANDLE MORE THAN 10k POINTS----------
% Issue the curve transfer command.
fprintf(interface, 'CURVE?');
raw = binblockread(interface, 'int16');
% Tektronix scopes send and extra terminator after the binblock.
fread(interface, 1);
3) In the user code, set a SCPI command to change the record size to the underlying interface object:
% interfaceObj is a VISA object.
fprintf(interfaceObj, 'HORizontal:RECordlength 5000000');
There you have it. Hopefully this helps out anyone else that's trying to figure out this issue.
Here's a bad idea. Start collecting 10,000 points. When you get to 5000 points, start collecting data again (you might need to run that in a new thread). Keep going back and forth until all the data you need are stored in 20 some data structures. Then, combine the structures into one structure by lining up the data points. This might be more work than calling the SCPI commands directly and might have some nasty caveats I haven't thought of. But as I said, it's a bad idea...

Matlab: Is it possible to save in the workspace a vector that contains 4 millions of values?

I am calculating something and, as result I have a vector 4 millions elements. I am not still finished to calculate it. I´ve calculate it will take 2 and a half hours more. When I will have finished I could save it?
It is not possible, what can I do?
Thank you.
On Windows 32-bit you can have at maximum a double array of 155-200 millions of elements. Check other OSs on Mathworks support page.
Yes, just use the command save. If you just need it for later Matlab computations, then it is best to save it in .mat format.
save('SavedFile.mat','largeVector')
You can then load your file whenever you need it using the load function.
load('SavedFile.mat')
On my machine it takes 0.01 sec to get a random vector with 4 million elements, with whos you can see that it takes (only) 32 MB.
It would take only few seconds to save such amount of data with MATLAB. If you are working with post-R2007b then maybe it is better to save with '-v7.3' option, newer MATLAB versions use by default general HDF5 format but there could be some performance/disc usage issues.