How to tell PyCUDA to reuse the memory from an earlier kernel? - pycuda

My program has two kernels and the second kernel should use the already uploaded input data and the results from the first kernel, so I can save the memory transfers. How would I archive this?
This is how I launch my kernels:
result = gpuarray.zeros(points, dtype=np.float32)
grid = (blocks,1),
block = (block_size, 1, 1),

In pycuda you won't transfer data to and from the device unless you explicitly request it.
For example, if you allocate memory and transfer some data to the GPU with:
result = float64(zeros( (height,width) )
result_device = gpuarray.to_gpu(result)
The variable result_device is a reference to the data in the GPU. You can pass result_device to any other kernel without incurring a memory transfer back to the CPU.
In this case a memory transfer will happen again when you call:
result = result_device.get()


Slowdown on tensorflow convolutional network with custom parameter update

I'm trying to implement a custom parameter update on a convolutional network, but every mini batch executed gets slower and slower.
I realize that there's no need to go through this trouble with a fixed learning rate, but I plan to update this later.
I call this in a loop where the feed_dict is the mini_batch.,.1,1),feed_dict = feed_dict)
def layered_optimizer(cost,base_rate, rate_multiplier):
gradients = tf.gradients(cost, [*weights, *biases])
#update parameters based on gradients: var = var - gradient * base_rate * multiplier
for i in range(len(weights)-1):
weights[i].assign(tf.subtract(weights[i], tf.multiply(gradients[i], base_rate * rate_multiplier)))
biases[i].assign(tf.subtract(biases[i], tf.multiply(gradients[len(weights)+i], base_rate * rate_multiplier)))
I'm not sure if this is has to do with the problem, but after trying to run the code a second time I get the following errors and have to restart.
could not create cudnn handle: CUDNN_STATUS_NOT_INITIALIZED
error retrieving driver version: Unimplemented: kernel reported driver version not implemented on Windows
could not destroy cudnn handle: CUDNN_STATUS_BAD_PARAM
Check failed: stream->parent()->GetConvolveAlgorithms( conv_parameters.ShouldIncludeWinogradNonfusedAlgo(), &algorithms)
What happens is that every time this gets called
gradients = tf.gradients(cost, [*weights, *biases])
a new instance of tf.gradients gets created, taking up unnecessary memory.

Matlab TCP-IP Server Lossless Transfer

I have tried and failed to implement a TCP-Listen server in Matlab that is "lossless". By lossless, I mean using the linux socat utility to send a file:
socat -u file.bin TCP4:
And receive a byte-exact match to that file within Matlab:
function t = test
fid = fopen('x','W');
t =tcpip('',50000,'NetWorkRole','server','InputBufferSize',50*1024^2);
t.BytesAvailableFcnMode = 'byte';
t.BytesAvailableFcnCount = 1024^2;
t.BytesAvailableFcn = {#FCN, fid };
function FCN( obj, event, fid )
x=fread( igetfield(obj, 'jobject'), obj.BytesAvailable, 0, 0);
fwrite( fid, x(1),'int8' );
I've tested this a good bit, and had decent success in terms of transfer rate (without the fwrite and use /dev/zero for the file, it saturates a gigabit link), and low cpu load. The trick is bypassing Matlab's tcpip() default wrapper*, and accessing a lower-level method via:
For the 2139160576 Byte file I test with, it usually receives ~2139153336 bytes. I've tried various other implementations that receive the fread() output into structs, cells, and array concatenation. They also are missing a few KiB's. I've tried a repeating pattern of 512 random bytes; one test had byte mismatch at the beginning.
Socat->Socat transfer works (obviously).
Socat->my-matlab-code holds at fopen() until socat connects. No data is transferred until fread() is called. I've tried throttling the transfer with the linux "pv" utility:
pv -L 10m file.bin | socat -u - TCP4:
To no effect.
My question is: where are the bytes going, or what should I test next?
(Edited to include further test results):
Outputting to a file is unnecessary, i.e. the fwrite() call. It is easier and faster to execute t=test, wait for the transfer to complete, i.e. the socat client to return, then query the total bytes transferred from within Matlab:
t.BytesAvailable + t.ValuesReceived
On my Windows machine, this value is always less than the file size of 2139160576 bytes. On my Ubuntu machine, occasionally the values equate. Furthermore, when they do not equate, "netstat -s" segments retransmited and packets discarded do not change. Wireshark monitoring of the loopback interface shows a final Matlab/server ACK sequence number of 2139160578. Presumably, 2 more than the file size due to both the server and client incrementing by one.
*As an aside, Matlab's Instrument Control Toolbox implementation of fread is a terrible wrapper around lower-level code I can't see. matlab\toolbox\shared\instrument\#icinterface\fread.m, function localFormatData, LINE 296. All the data types are explicitly cast to double with numeric type conversion. This results in massive cpu load, not to mention lossy conversion between data types.

Can't open matlab file

I have a ".mat" file supposedly containing a [30720000x4 double] matrix (values from accelerometers). When I try to open this file with "Import data" in Matlab I get the following error:
Error using load
Can't read file F:\vibration_exp_2\GR_UB50n\bearing1\GR_UB50n_1_2.mat.
Error using load
Unknown text on line number 1 of ASCII file
Error in uiimport/runImportdata (line 456)
datastruct = load('-ascii', fileAbsolutePath);
Error in uiimport/gatherFilePreviewData (line 424)
[datastruct, textDelimiter, headerLines]= runImportdata(fileAbsolutePath,
Error in uiimport (line 240)
[ctorPreviewText, ctorHeaderLines, ctorDelim] = ...
The filesize is 921MB which is the same as my other files that do open. I also tried opening the file using python, but no success. Any suggestions? I use MATLAB R2013b .
More info:
How the file was create:
%% acquisition of vibration data
% input:
% sample rate in Hz (max. 51200 Hz, should be used as bearing
% faults are high-frequent)
% time in seconds, stating the duration of the measurement
% (e.g. 600 seconds = 10 minutes)
% filename for the file to be saved
% examples:
% data = DAQ(51200, 600, 'NF1_1.mat');
% data = DAQ(51200, 600, 'NF1_2.mat');
function data = DAQ(samplerate,time,filename)
s = daq.createSession('ni'); % Creates the DAQ session
%%% Add the channels as accelerometer channels (meaning IEPE is turned on)
s.Rate = samplerate;
s.NumberOfScans = samplerate*time;
%%% Defining the Sensitivities in V/g
s.Channels(1).Sensitivity = 0.09478; %31965, top outer
s.Channels(2).Sensitivity = 0.09531; %31966, back outer
s.Channels(3).Sensitivity = 0.09275; %31964, top inner
s.Channels(4).Sensitivity = 0.09363; %31963, back inner
data = s.startForeground(); %Acquiring the data
save(filename, 'data');
More info:
When I open the file using a simple text editor I can see a lot of characters that do not make sense​ but also the first line:
MATLAB 5.0 MAT-FILE, Platform: PCWIN64, Created on: Thu Apr 30
16:29:07 2015
More info:
The file itself:
It is 921MB.
How can I recover my data?
I've tried this, but got memory errors.
I've also tried this, but it did not work.
I fear I can't add many good news to what you know already, but it hasn't been mentioned yet.
The reason the .mat-file can't be load is due to the data beeing corrupted. What makes it 'unrecoverable' is the way it is stored internally. The exact format is specified in the MAT-File Format Documentation. So I decided to manually construct a simple reader to specifically read your .mat file.
It makes sense, that the splitmat.m can't recover anything, as it will basicly split the data into chunks, one stored variable per chunk, however in this case there is only 1 variable stored and thus only one chunk, which happens to be the corrupted one.
In this case, the data is stored as a miCOMPRESSED, which is a normal matlab array compressed using gzip. (Which, as a side note, doesn't seem like a good fit for 'random' vibration data.) This might explain previous comments about the smaller file size then the full data, as the filesize matches exatly with the internally stored value.
I extracted the compressed archive and tried to uncompress it in a variety of ways. Basicly it is a '.gz' without the header, that can be appended manually. Unfortunatly there seems to be a corrupted block near the start of the dataset. I am by no means an expert on gzip, but as far as I know the dictionary (or decryption key) is stored dynamicly which makes all data useless from the point the block is corrupted. If you are really eager, there seems to be a way to recover data even behind the point where data is corrupted, but that method is massively timeconsuming. Also the only way to validate data of those sections is manual inspection, which in your case might proof very difficult.
Below is the code, that I used to extract the .gz-file, so if you want to give it a try, this might get you started. If you manage to decrypt the data, you can read it as described in the MAT-File Format, 13f.
corrupted_file_id = fopen('corrupt.mat','r');
%% some header data
% can be skipped replacing this block with
% fread(id,132);
%header of .mat file
header_text = char(fread(corrupted_file_id,116,'char')');
subsystem_data_offset = fread(corrupted_file_id,8,'uint8');
version = fread(corrupted_file_id,1,'int16');
endian_indicator = char(fread(corrupted_file_id,2,'int8')');
data_type = fread(corrupted_file_id,4,'uint8');
%data_type is 15, so it is a compressed matlab array
%% save te content
data_size = fread(corrupted_file_id,1,'uint32');
gz_file_id = fopen('compressed_array.gz','w');
% first write a valid gzip head
% then write the data sequentialy
step = 1:1e3:data_size;% 1MB steps
for idx = step
step = step(end):data_size;% 1B steps
for idx = step
To answer literally to the question, my suggestion would be to make sure first that the file is okay. This tool on File Exchange apparently knows how to diagnose corrupted .MAT files starting with version V5 (R8):
The file's size (indices going out of range) seems to be a problem. Octave, which should read .mat files, gives the error
memory exhausted or requested size too large for range of Octave's index type
To find out what is wrong you may need to write a test program outside MatLab, where you have more control over memory management. Examples are here, including instructions on how to build them on your own platform. These stand-alone programs may not have the same memory issues. The program matdgns.c is specifically made to check .mat files for errors.

Counter/Storage Collector + Matlab

I'm trying to design a storage system where excess energy goes into it. There is a cap of a maximum storage size for the system. I am struggling to work out how to code this in matlab.
Currently im using a function similar to this
max_storage = no_tanks*tank_size
if cumsum(excess) > 0
storage = cumsum(excess)
elseif cumsum(excess) < 0
After that I am confused how to continue writing the code. Any help would be greatly appreciated
Attempt at mind-reading, while awaiting an update to the question.
To limit the storage size to max_storage, you need to have some code like
storage = calc_storage(excess); % or whatever
storage = min(storage, max_storage);
Don't forget to finish your statements with ;, and if you need to use cumsum(excess) lots of times it is better to assign it to a variable rather than calculating it over and over again.

How can I prevent GD from running out of memory?

I'm not sure if memory is the culprit here. I am trying to instantiate a GD image from data in memory (it previously came from a database). I try a call like this:
my $image = GD::Image->new($image_data);
$image comes back as undef. The POD for GD says that the constructor will return undef for cases of insufficient memory, so that's why I suspect memory.
The image data is in PNG format. The same thing happens if I call newFromPngData.
This works for very small images, like under 30K. However, slightly larger images, like ~70K will cause the problem. I wouldn't think that a 70K image should cause these problems, even after it is deflated.
This script is running under CGI through Apache 2.0, on OS 10.4, if that matters at all.
Are there any memory limitations imposed by Apache by default? Can they be increased?
Thanks for any insight!
EDIT: For clarification, the GD::Image object never gets created, so clearing out the $image_data from memory isn't really an option.
GD library eats many bytes per byte of image size. It's a well over a 10:1 ratio!
When a user uploads an image to our system, we start by checking the file size before loading it into a GD image. If it's over a threshold (1 Megabyte) we don't use it but instead report an error to the user.
If we really cared we could dump it to disk, use the command line "convert" tool to rescale it to a sane size, then load the output into the GD library and remove the temporary file.
convert -define jpeg:size=800x800 tmpfile.jpg -thumbnail '800x800' -
Will scale the image so it fits within an 800 x 800 square. It's longest edge is now 800px which should safely load. The above command will send the shrunk .jpg to STDOUT. The size= option should tell convert not to bother holding the huge image in memory, but just enough to scale to 800x800.
I've run into the same problem a few times.
One of my solutions was simply to increase the amount of memory available to my scripts. The other was to clear the buffer:
Original Script:
$src_img = imagecreatefromstring($userfile2);
Edited Script:
$src_img = imagecreatefromstring($userfile2);
By clearing out the memory of the first src_image, it freed up enough to handle more processing.