How to scale, crop, and rotate all at once in Android RenderScript - renderscript

Is it possible to take a camera image in Y'UV format and using RenderScript:
Convert it to RGBA
Crop it to a certain region
Rotate it if necessary

Yes! I figured out how and thought I would share it with others. RenderScript has a bit of a learning curve, and more simple examples seem to help.
When cropping, you still need to set up an input and output allocation as well as one for the script itself. It might seem strange at first, but the input and output allocations have to be the same size so if you are cropping you need to set up yet another Allocation to write the cropped output. More on that in a second.
#pragma version(1)
#pragma rs java_package_name(com.autofrog.chrispvision)
#pragma rs_fp_relaxed
/*
* This is mInputAllocation
*/
rs_allocation gInputFrame;
/*
* This is where we write our cropped image
*/
rs_allocation gOutputFrame;
/*
* These dimensions define the crop region that we want
*/
uint32_t xStart, yStart;
uint32_t outputWidth, outputHeight;
uchar4 __attribute__((kernel)) yuv2rgbFrames(uchar4 in, uint32_t x, uint32_t y)
{
uchar Y = rsGetElementAtYuv_uchar_Y(gInputFrame, x, y);
uchar U = rsGetElementAtYuv_uchar_U(gInputFrame, x, y);
uchar V = rsGetElementAtYuv_uchar_V(gInputFrame, x, y);
uchar4 rgba = rsYuvToRGBA_uchar4(Y, U, V);
/* force the alpha channel to opaque - the conversion doesn't seem to do this */
rgba.a = 0xFF;
uint32_t translated_x = x - xStart;
uint32_t translated_y = y - yStart;
uint32_t x_rotated = outputWidth - translated_y;
uint32_t y_rotated = translated_x;
rsSetElementAt_uchar4(gOutputFrame, rgba, x_rotated, y_rotated);
return rgba;
}
To set up the allocations:
private fun createAllocations(rs: RenderScript) {
/*
* The yuvTypeBuilder is for the input from the camera. It has to be the
* same size as the camera (preview) image
*/
val yuvTypeBuilder = Type.Builder(rs, Element.YUV(rs))
yuvTypeBuilder.setX(mImageSize.width)
yuvTypeBuilder.setY(mImageSize.height)
yuvTypeBuilder.setYuvFormat(ImageFormat.YUV_420_888)
mInputAllocation = Allocation.createTyped(
rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT or Allocation.USAGE_SCRIPT)
/*
* The RGB type is also the same size as the input image. Other examples write this as
* an int but I don't see a reason why you wouldn't be more explicit about it to make
* the code more readable.
*/
val rgbType = Type.createXY(rs, Element.RGBA_8888(rs), mImageSize.width, mImageSize.height)
mScriptAllocation = Allocation.createTyped(
rs, rgbType,
Allocation.USAGE_SCRIPT)
mOutputAllocation = Allocation.createTyped(
rs, rgbType,
Allocation.USAGE_IO_OUTPUT or Allocation.USAGE_SCRIPT)
/*
* Finally, set up an allocation to which we will write our cropped image. The
* dimensions of this one are (wantx,wanty)
*/
val rgbCroppedType = Type.createXY(rs, Element.RGBA_8888(rs), wantx, wanty)
mOutputAllocationRGB = Allocation.createTyped(
rs, rgbCroppedType,
Allocation.USAGE_SCRIPT)
}
Finally, since you're cropping you need to tell the script what to do before invocation. If the image sizes don't change you can probably optimize this by moving the LaunchOptions and variable settings so they occur just once (rather than every time) but I'm leaving them here for my example to make it clearer.
override fun onBufferAvailable(a: Allocation) {
// Get the new frame into the input allocation
mInputAllocation!!.ioReceive()
// Run processing pass if we should send a frame
val current = System.currentTimeMillis()
if (current - mLastProcessed >= mFrameEveryMs) {
val lo = Script.LaunchOptions()
/*
* These coordinates are the portion of the original image that we want to
* include. Because we're rotating (in this case) x and y are reversed
* (but still offset from the actual center of each dimension)
*/
lo.setX(starty, endy)
lo.setY(startx, endx)
mScriptHandle.set_xStart(lo.xStart.toLong())
mScriptHandle.set_yStart(lo.yStart.toLong())
mScriptHandle.set_outputWidth(wantx.toLong())
mScriptHandle.set_outputHeight(wanty.toLong())
mScriptHandle.forEach_yuv2rgbFrames(mScriptAllocation, mOutputAllocation, lo)
val output = Bitmap.createBitmap(
wantx, wanty,
Bitmap.Config.ARGB_8888
)
mOutputAllocationRGB!!.copyTo(output)
/* Do something with the resulting bitmap */
listener?.invoke(output)
mLastProcessed = current
}
}
All this might seem like a bit much but it's very fast - way faster than doing the rotation on the java/kotlin side, and thanks to RenderScript's ability to run the kernel function over a subset of the image it's less overhead than creating a bitmap then creating a second, cropped one.
For me, all the rotation is necessary because the image seen by the RenderScript was 90 degrees rotated from the camera. I am told this is some kind of peculiarity of having a Samsung phone.
RenderScript was intimidating at first but once you get used to what it's doing it's not so bad. I hope this is helpful to someone.

Related

Unity Compute Shader Texture Array Sampling

I've been struggling with this for while now and it is quite time critical so I have to ask here. I'm quite new to compute shaders but from what I've read, it is what I need for my usecase. I'm trying to find the total score from an array of textures, with the score being the product of each channel and a given weight. Previously, I was using NodeJS to do it but it doesn't scale as well given increasing the dimensions by 4 would increase the area required per texture by 16 and with multiple textures this isn't a good solution.
This is my compute shader right now:
// Each #kernel tells which function to compile; you can have many kernels
#pragma kernel CSMain
// Create a RenderTexture with enableRandomWrite flag and set it
// with cs.SetTexture
SamplerState linearClampSampler;
float4 weights;
RWStructuredBuffer<Texture2DArray<float4>> scoreInput;
float output;
[numthreads(8,8,1)]
void CSMain (uint3 id : SV_DispatchThreadID)
{
float4 result_mult = scoreInput[id.z].Sample(id.uv).rgba * weights.xyzw;
output = result_mult.r + result_mult.g + result_mult.b + result_mult.a;
}
For my C# dispatcher, I am doing:
string[] paths = new string[sessionData.masks.Length];
Texture2D[] textures = new Texture2D[sessionData.masks.Length];
for (int i = 0; i < sessionData.masks.Length; i++)
{
paths[i] = sessionData.masks[i].combinedMasks;
textures[i] = CustomUtility.LoadPNG(paths[i]);
}
int colourSize = sizeof(float) * 4;
ComputeBuffer wallBuffer = new ComputeBuffer(textures.Length, colourSize);
wallBuffer.SetData(textures);
CalculateScoreShader.SetBuffer(0, "scoreInput", wallBuffer);
CalculateScoreShader.Dispatch(0, 8,8,1);
I can't figure out how to sample the texture properly, and I want to make sure that I am setting up the buffer correctly for the shader to used like this. I also want to retrieve the output, but again I'm unsure how to do this.
I have looked through a decent amount of tutorials and documentation but I just can't seem to find the solution.

RenderScript Sobel lmplementation, different in- and output types

I want to implement a Sobel filter in RenderScript with uchar4 as Input allocation and float[] as Output allocation. I am not quite sure whether it is possible to use different types for Input and Output allocations in a RenderScript. I want to develop the solution myself, but would be grateful to get some advice on the best Renderscript structure to takle that Problem. Somewhere I read, that it is possible to use
float attribute((kernel)) root(uchar4 *v_in, uint32_t x, uint32_t y) {
}
Would you recommend such Approach or can this be done without using actually a kernel, i.e. just a function? Thanks in advance.
My rs code for the Sobel (X direction) now looks as follows:
#pragma version(1)
#pragma rs java_package_name(com.example.xxx)
#pragma rs_fp_relaxed
rs_allocation gIn;
int32_t width;
int32_t height;
float __attribute__((kernel)) sobelX(uchar4 *v_in, uint32_t x, uint32_t y) {
float out=0;
if (x>0 && y>0 && x<(width-1) && y<(height-1){
uchar4 c11=rsGetElementAt_uchar4(gIn, x-1, y-1);
uchar4 c21=rsGetElementAt_uchar4(gIn, x, y-1);
uchar4 c31=rsGetElementAt_uchar4(gIn, x+1, y-1);
uchar4 c13=rsGetElementAt_uchar4(gIn, x-1, y+1);
uchar4 c23=rsGetElementAt_uchar4(gIn, x, y+1);
uchar4 c33=rsGetElementAt_uchar4(gIn, x+1, y+1);
float4 f11=convert_float4(c11);
float4 f21=convert_float4(c21);
float4 f31=convert_float4(c31);
float4 f13=convert_float4(c13);
float4 f23=convert_float4(c23);
float4 f33=convert_float4(c33);
out= f11.r-f13.r + 2*(f21.r-f23.r) + f31.r-f33.r;
}
return out;
}
What I am struggling is passing the Parameters from Java side:
float[][] gx = new float[width][height];
ScriptC_sobel script;
script=new ScriptC_sobel(rs);
script.set_width(width) ;
script.set_height(height) ;
script.set_gIn(bmpGray);
Allocation inAllocation = Allocation.createFromBitmap(rs, bmpGray, Allocation.MipmapControl.MIPMAP_NONE,
Allocation.USAGE_SCRIPT);
Allocation outAllocation = Allocation.createTyped(rs, float,2) ;
script.forEach_sobelX(inAllocation, outAllocation);
outAllocation.copyTo(gx) ;
I understand that, in order to use rsGetElementAt function (to access neighboring data within the kernel) I need to set the input allocation as a script global as well (rs_allocation gIn in rs code). However, I'm not sure how to handle this "double allocation" from the Java side. Also the outAllocation Statement in the Java code is probably not correct. Specifiyally I am not sure, whether the Kernel will returned this as float[] or as float[][].
It is possible to use different types for input and output. In your case, I would actually suggest:
float __attribute__((kernel)) sobel(unchar4 *v_in, uint32_t x, uint32_t y) {}
You certainly want to use a kernel, so that the performance can benefit from execution by multiple threads.
Also, have a look at this example of doing 3x3 convolution in RS.
UPDATE: generally, the best in/out parameters to use depend on the type of output you want this filter to generate - is it just the magnitude? Then uint output will most likely suffice.
UPDATE2: If you are going to use a variable to pass input allocation, then you don't need it in the kernel parameters, i.e.:
float __attribute__((kernel)) sobelX(uint32_t x, uint32_t y)
The rest of the script looks ok (sans missing parenthesis in the conditional). As for the Java part, below I am pasting a demonstration of how you should prepare the output allocation and start the script. The kernel will then be invoked for every cell (i.e. every float) in the output allocation.
float[] gx = new float[width * height];
Type.Builder TypeIn = new Type.Builder(mRS, Element.F32(mRS));
TypeIn.setX(width).setY(height);
Allocation outAllocation = Allocation.createTyped(mRS, TypeIn.create());
mScript.forEach_sobelX(outAllocation);

Renderscript Greyscale not quite working

This is my renderscript code for now:
#pragma version(1)
#pragma rs java_package_name(com.apps.foo.bar)
rs_allocation inPixels;
uchar4 RS_KERNEL root(uchar4 in, uint32_t x, uint32_t y) {
uchar4 pixel = in.rgba;
pixel.r = (pixel.r + pixel.g + pixel.b)/3;
pixel.g = (pixel.r + pixel.g + pixel.b)/3;
pixel.b = (pixel.r + pixel.g + pixel.b)/3;
return pixel;
}
My phone shows a "greyscaled" picture. I say "grayscaled" because red for example, is still kinda red...It is gray-ish but you can still see that is red. I know I can use more sophisticated methods, but I would like to stick to the simple one for now.
I would like to know if my renderscript code is wrong. Should I be converting the char to another type?
Use a temporary variable to hold the result as you compute it. Otherwise, in the first line you're modifying pixel.r, and in the very next one you are using it to calculate pixel.g. No wonder you get artifacts.
Also, don't forget to assign the alpha value to avoid surprises with "invisible" output.
Also I would recommend not to use equal weights for r, g and b but the weights as below. See e.g. http://www.johndcook.com/blog/2009/08/24/algorithms-convert-color-grayscale/
char4 __attribute__((kernel)) gray(uchar4 in) {
uchar4 out;
float gr= 0.2125*in.r + 0.7154*in.g + 0.0721*in.b;
out.r = out.g = out.b = gr;
out.a = in.a;
return out;
}

Using OpenGL in Matlab to get depth buffer

Ive asked a similar question before and didnt manage to find a direct answer.
Could someone provide sample code for extracting the depth buffer of the rendering of an object into a figure in Matlab?
So lets say I load an obj file or even just a simple surf call, render it and now want to get to its depth buffer then what code will do that for me using both Matlab and OpenGL. I.e. how do I set this up and then access the actual data?
I essentially want to be able to use Matlabs powerful plotting functions and then be able to access the underlying graphics context for getting the depth buffer out.
NOTE: The bounty specifies JOGL but that is not a must. Any code which acts as above and can provide me with the depth buffer after running it in Matlab is sufficient)
Today, I went drinking with my colleagues, and after five beers and some tequillas I found this question and thought, "have at ya!" So I was struggling for a while but then I found a simple solution using MEX. I theorized that the OpenGL context, created by the last window, could be left active and therefore could be accessible from "C", if the script ran in the same thread.
I created a simple "C" program which calls one matlab function, called "testofmyfilter" which plots frequency response of a filter (that was the only script I had at hand). This is rendered using OpenGL. Then the program uses glGetViewport() and glReadPixels() to get to the OpenGL buffers. Then it creates a matrix, fills it with the depth values, and passes it to the second function, called "trytodisplaydepthmap". It just displays the depthmap using the imshow function. Note that the MEX function is allowed to return values as well, so maybe the postprocessing would not have to be another function, but I'm in no state to be able to understand how it's done. Should be trivial, though. I'm working with MEX for the first time today.
Without further delay, there are source codes I used:
testofmyfilter.m
imp = zeros(10000,1);
imp(5000) = 1;
% impulse
[bwb,bwa] = butter(3, 0.1, 'high');
b = filter(bwb, bwa, imp);
% filter impulse by the filter
fs = 44100; % sampling frequency (all frequencies are relative to fs)
frequency_response=fft(b); % calculate response (complex numbers)
amplitude_response=20*log10(abs(frequency_response)); % calculate module of the response, convert to dB
frequency_axis=(0:length(b)-1)*fs/length(b); % generate frequency values for each response value
min_f=2;
max_f=fix(length(b)/2)+1; % min, max frequency
figure(1);
lighting gouraud
set(gcf,'Renderer','OpenGL')
semilogx(frequency_axis(min_f:max_f),amplitude_response(min_f:max_f),'r-') % plot with logarithmic axis using red line
axis([frequency_axis(min_f) frequency_axis(max_f) -90 10]) % set axis limits
xlabel('frequency [Hz]');
ylabel('amplitude [dB]'); % legend
grid on % draw grid
test.c
//You can include any C libraries that you normally use
#include "windows.h"
#include "stdio.h"
#include "math.h"
#include "mex.h" //--This one is required
extern WINAPI void glGetIntegerv(int n_enum, int *p_value);
extern WINAPI void glReadPixels(int x,
int y,
int width,
int height,
int format,
int type,
void * data);
#define GL_VIEWPORT 0x0BA2
#define GL_DEPTH_COMPONENT 0x1902
#define GL_FLOAT 0x1406
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int viewport[4], i, x, y;
int colLen;
float *data;
double *matrix;
mxArray *arg[1];
mexCallMATLAB(0, NULL, 0, NULL, "testofmyfilter");
// call an .m file which creates OpenGL window and draws a plot inside
glGetIntegerv(GL_VIEWPORT, viewport);
printf("GL_VIEWPORT = [%d, %d, %d, %d]\n", viewport[0], viewport[1], viewport[2], viewport[3]);
// print viewport dimensions, should be [0, 0, m, n]
// where m and n are size of the GL window
data = (float*)malloc(viewport[2] * viewport[3] * sizeof(float));
glReadPixels(0, 0, viewport[2], viewport[3], GL_DEPTH_COMPONENT, GL_FLOAT, data);
// alloc data and read the depth buffer
/*for(i = 0; i < 10; ++ i)
printf("%f\n", data[i]);*/
// debug
arg[0] = mxCreateNumericMatrix(viewport[3], viewport[2], mxDOUBLE_CLASS, mxREAL);
matrix = mxGetPr(arg[0]);
colLen = mxGetM(arg[0]);
printf("0x%08x 0x%08x 0x%08x %d\n", data, arg[0], matrix, colLen); // debug
for(x = 0; x < viewport[2]; ++ x) {
for(y = 0; y < viewport[3]; ++ y)
matrix[x * colLen + y] = data[x + (viewport[3] - 1 - y) * viewport[2]];
}
// create matrix, copy data (this is stupid, but matlab switches
// rows/cols, also convert float to double - but OpenGL could have done that)
free(data);
// don't need this anymore
mexCallMATLAB(0, NULL, 1, arg, "trytodisplaydepthmap");
// pass the array to a function (returnig something from here
// is beyond my understanding of mex, but should be doable)
mxDestroyArray(arg[0]);
// cleanup
return;
}
trytodisplaydepthmap.m:
function [] = trytodisplaydepthmap(depthMap)
figure(2);
imshow(depthMap, []);
% see what's inside
Save all of these to the same directory, compile test.c with (type that to Matlab console):
mex test.c Q:\MATLAB\R2008a\sys\lcc\lib\opengl32.lib
Where "Q:\MATLAB\R2008a\sys\lcc\lib\opengl32.lib" is path to "opengl32.lib" file.
And finally execute it all by merely typing "test" in matlab console. It should bring up a window with filter frequency response, and another window with the depth buffer. Note the front and back buffers are swapped at the moment "C" code reads the depth buffer, so it might be required to run the script twice to get any results (so the front buffer which now contains the results swaps with back buffer again, and the depth can be read out). This could be done automatically by "C", or you can try including getframe(gcf); at the end of your script (that reads back from OpenGL as well so it swaps the buffers for you, or something).
This works for me in Matlab 7.6.0.324 (R2008a). The script runs and spits out the following:
>>test
GL_VIEWPORT = [0, 0, 560, 419]
0x11150020 0x0bd39620 0x12b20030 419
And of course it displays the images. Note the depth buffer range depends on Matlab, and can be quite high, so making any sense of the generated images may not be straightforward.
the swine's answer is the correct one.
Here is a slightly formatted and simpler version that is cross-platform.
Create a file called mexGetDepth.c
#include "mex.h"
#define GL_VIEWPORT 0x0BA2
#define GL_DEPTH_COMPONENT 0x1902
#define GL_FLOAT 0x1406
void mexFunction(int nlhs, mxArray *plhs[], int nrhs, const mxArray *prhs[])
{
int viewport[4], i, x, y;
int colLen;
float *data;
double *matrix;
glGetIntegerv(GL_VIEWPORT, viewport);
data = (float*)malloc(viewport[2] * viewport[3] * sizeof(float));
glReadPixels(0, 0, viewport[2], viewport[3], GL_DEPTH_COMPONENT, GL_FLOAT, data);
plhs[0] = mxCreateNumericMatrix(viewport[3], viewport[2], mxDOUBLE_CLASS, mxREAL);
matrix = mxGetPr(plhs[0]);
colLen = mxGetM(plhs[0]);
for(x = 0; x < viewport[2]; ++ x) {
for(y = 0; y < viewport[3]; ++ y)
matrix[x * colLen + y] = data[x + (viewport[3] - 1 - y) * viewport[2]];
}
free(data);
return;
}
Then if youre on windows compile using
mex mexGetDepth.c "path to OpenGL32.lib"
or if youre on a nix system
mex mexGetDepth.c "path to opengl32.a"
Then run the following small script to test out the new function
peaks;
figure(1);
depthData=mexGetDepth;
figure
imshow(depthData);

Get orientation device in the iPhone for Opengl Es

I'm trying to convert the geomagnetic and accelerometer to rotate the camera in opengl ES1, I found some code from android and changed this code for iPhone, actually it is working more or less, but there are some mistakes, I´m not able to find this mistake, I put the code, also the call to Opengl Es1: glLoadMatrixf((GLfloat*)matrix);
- (void) GetAccelerometerMatrix:(GLfloat *) matrix headingX: (float)hx headingY:(float)hy headingZ:(float)hz;
{
_geomagnetic[0] = hx * (FILTERINGFACTOR-0.05) + _geomagnetic[0] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[3] * (0.55);
_geomagnetic[1] = hy * (FILTERINGFACTOR-0.05) + _geomagnetic[1] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[4] * (0.55);
_geomagnetic[2] = hz * (FILTERINGFACTOR-0.05) + _geomagnetic[2] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[5] * (0.55);
_geomagnetic[3]=_geomagnetic[0] ;
_geomagnetic[4]=_geomagnetic[1];
_geomagnetic[5]=_geomagnetic[2];
//Clear matrix to be used to rotate from the current referential to one based on the gravity vector
bzero(matrix, sizeof(matrix));
//MAGNETIC
float Ex = -_geomagnetic[1];
float Ey =_geomagnetic[0];
float Ez =_geomagnetic[2];
//ACCELEROMETER
float Ax= -_accelerometer[0];
float Ay= _accelerometer[1] ;
float Az= _accelerometer[2] ;
float Hx = Ey*Az - Ez*Ay;
float Hy= Ez*Ax - Ex*Az;
float Hz = Ex*Ay - Ey*Ax;
float normH = (float)sqrt(Hx*Hx + Hy*Hy + Hz*Hz);
float invH = 1.0f / normH;
Hx *= invH;
Hy *= invH;
Hz *= invH;
float invA = 1.0f / (float)sqrt(Ax*Ax + Ay*Ay + Az*Az);
Ax *= invA;
Ay *= invA;
Az *= invA;
float Mx = Ay*Hz - Az*Hy;
float My = Az*Hx - Ax*Hz;
float Mz = Ax*Hy - Ay*Hx;
// if (mOut.f != null) {
matrix[0] = Hx; matrix[1] = Hy; matrix[2] = Hz; matrix[3] = 0;
matrix[4] = Mx; matrix[5] = My; matrix[6] = Mz; matrix[7] = 0;
matrix[8] = Ax; matrix[9] = Ay; matrix[10] = Az; matrix[11] = 0;
matrix[12] = 0; matrix[13] = 0; matrix[14] = 0; matrix[15] = 1;
}
Thank you very much for the help.
Edit: The iPhone it is permantly in landscape orientation and I know that something is wrong because the object painted in Opengl Es appears two times.
Have you looked at Apple's GLGravity sample code? It does something very similar to what you want here, by manipulating the model view matrix in response to changes in the accelerometer input.
I'm unable to find any problems with the code posted, and would suggest the problem is elsewhere. If it helps, my analysis of the code posted is that:
The first six lines, dealing with _geomagnetic 0–5, effect a very simple low frequency filter, which assumes you call the method at regular intervals. So you end up with a version of the magnetometer vector, hopefully with high frequency jitter removed.
The bzero zeroes the result, ready for accumulation.
The lines down to the declaration and assignment to Hz take the magnetometer and accelerometer vectors and perform the cross product. So H(x, y, z) is now a vector at right angles to both the accelerometer (which is presumed to be 'down') and the magnetometer (which will be forward + some up). Call that the side vector.
The invH and invA stuff, down to the multiplication of Az by invA ensure that the side and accelerometer/down vectors are of unit length.
M(x, y, z) is then created, as the cross product of the side and down vectors (ie, a vector at right angles to both of those). So it gives the front vector.
Finally, the three vectors are used to populate the matrix, taking advantage of the fact that the inverse of an orthonormal 3x3 matrix is its transpose (though that's sort of hidden by the way things are laid out — pay attention to the array indices). You actually set everything in the matrix directly, so the bzero wasn't necessary in pure outcome terms.
glLoadMatrixf is then the correct thing to use because that's how you multiply by an arbitrary column-major matrix in OpenGL ES 1.x.