Renderscript Greyscale not quite working - renderscript

This is my renderscript code for now:
#pragma version(1)
#pragma rs java_package_name(
rs_allocation inPixels;
uchar4 RS_KERNEL root(uchar4 in, uint32_t x, uint32_t y) {
uchar4 pixel = in.rgba;
pixel.r = (pixel.r + pixel.g + pixel.b)/3;
pixel.g = (pixel.r + pixel.g + pixel.b)/3;
pixel.b = (pixel.r + pixel.g + pixel.b)/3;
return pixel;
My phone shows a "greyscaled" picture. I say "grayscaled" because red for example, is still kinda red...It is gray-ish but you can still see that is red. I know I can use more sophisticated methods, but I would like to stick to the simple one for now.
I would like to know if my renderscript code is wrong. Should I be converting the char to another type?

Use a temporary variable to hold the result as you compute it. Otherwise, in the first line you're modifying pixel.r, and in the very next one you are using it to calculate pixel.g. No wonder you get artifacts.
Also, don't forget to assign the alpha value to avoid surprises with "invisible" output.

Also I would recommend not to use equal weights for r, g and b but the weights as below. See e.g.
char4 __attribute__((kernel)) gray(uchar4 in) {
uchar4 out;
float gr= 0.2125*in.r + 0.7154*in.g + 0.0721*in.b;
out.r = out.g = out.b = gr;
out.a = in.a;
return out;


How to scale, crop, and rotate all at once in Android RenderScript

Is it possible to take a camera image in Y'UV format and using RenderScript:
Convert it to RGBA
Crop it to a certain region
Rotate it if necessary
Yes! I figured out how and thought I would share it with others. RenderScript has a bit of a learning curve, and more simple examples seem to help.
When cropping, you still need to set up an input and output allocation as well as one for the script itself. It might seem strange at first, but the input and output allocations have to be the same size so if you are cropping you need to set up yet another Allocation to write the cropped output. More on that in a second.
#pragma version(1)
#pragma rs java_package_name(com.autofrog.chrispvision)
#pragma rs_fp_relaxed
* This is mInputAllocation
rs_allocation gInputFrame;
* This is where we write our cropped image
rs_allocation gOutputFrame;
* These dimensions define the crop region that we want
uint32_t xStart, yStart;
uint32_t outputWidth, outputHeight;
uchar4 __attribute__((kernel)) yuv2rgbFrames(uchar4 in, uint32_t x, uint32_t y)
uchar Y = rsGetElementAtYuv_uchar_Y(gInputFrame, x, y);
uchar U = rsGetElementAtYuv_uchar_U(gInputFrame, x, y);
uchar V = rsGetElementAtYuv_uchar_V(gInputFrame, x, y);
uchar4 rgba = rsYuvToRGBA_uchar4(Y, U, V);
/* force the alpha channel to opaque - the conversion doesn't seem to do this */
rgba.a = 0xFF;
uint32_t translated_x = x - xStart;
uint32_t translated_y = y - yStart;
uint32_t x_rotated = outputWidth - translated_y;
uint32_t y_rotated = translated_x;
rsSetElementAt_uchar4(gOutputFrame, rgba, x_rotated, y_rotated);
return rgba;
To set up the allocations:
private fun createAllocations(rs: RenderScript) {
* The yuvTypeBuilder is for the input from the camera. It has to be the
* same size as the camera (preview) image
val yuvTypeBuilder = Type.Builder(rs, Element.YUV(rs))
mInputAllocation = Allocation.createTyped(
rs, yuvTypeBuilder.create(),
Allocation.USAGE_IO_INPUT or Allocation.USAGE_SCRIPT)
* The RGB type is also the same size as the input image. Other examples write this as
* an int but I don't see a reason why you wouldn't be more explicit about it to make
* the code more readable.
val rgbType = Type.createXY(rs, Element.RGBA_8888(rs), mImageSize.width, mImageSize.height)
mScriptAllocation = Allocation.createTyped(
rs, rgbType,
mOutputAllocation = Allocation.createTyped(
rs, rgbType,
Allocation.USAGE_IO_OUTPUT or Allocation.USAGE_SCRIPT)
* Finally, set up an allocation to which we will write our cropped image. The
* dimensions of this one are (wantx,wanty)
val rgbCroppedType = Type.createXY(rs, Element.RGBA_8888(rs), wantx, wanty)
mOutputAllocationRGB = Allocation.createTyped(
rs, rgbCroppedType,
Finally, since you're cropping you need to tell the script what to do before invocation. If the image sizes don't change you can probably optimize this by moving the LaunchOptions and variable settings so they occur just once (rather than every time) but I'm leaving them here for my example to make it clearer.
override fun onBufferAvailable(a: Allocation) {
// Get the new frame into the input allocation
// Run processing pass if we should send a frame
val current = System.currentTimeMillis()
if (current - mLastProcessed >= mFrameEveryMs) {
val lo = Script.LaunchOptions()
* These coordinates are the portion of the original image that we want to
* include. Because we're rotating (in this case) x and y are reversed
* (but still offset from the actual center of each dimension)
lo.setX(starty, endy)
lo.setY(startx, endx)
mScriptHandle.forEach_yuv2rgbFrames(mScriptAllocation, mOutputAllocation, lo)
val output = Bitmap.createBitmap(
wantx, wanty,
/* Do something with the resulting bitmap */
mLastProcessed = current
All this might seem like a bit much but it's very fast - way faster than doing the rotation on the java/kotlin side, and thanks to RenderScript's ability to run the kernel function over a subset of the image it's less overhead than creating a bitmap then creating a second, cropped one.
For me, all the rotation is necessary because the image seen by the RenderScript was 90 degrees rotated from the camera. I am told this is some kind of peculiarity of having a Samsung phone.
RenderScript was intimidating at first but once you get used to what it's doing it's not so bad. I hope this is helpful to someone.

How does the reversebits function of HLSL SM5 work?

I am trying to implement an inverse FFT in a HLSL compute shader and don't understand how the new inversebits function works. The shader is run under Unity3D, but that shouldn't make a difference.
The problem is, that the resulting texture remains black with the exception of the leftmost one or two pixels in every row. It seems to me, as if the reversebits function wouldn't return the correct indexes.
My very simple code is as following:
#pragma kernel BitReverseHorizontal
Texture2D<float4> inTex;
RWTexture2D<float4> outTex;
uint2 getTextureThreadPosition(uint3 groupID, uint3 threadID) {
uint2 pos;
pos.x = (groupID.x * 16) + threadID.x;
pos.y = (groupID.y * 16) + threadID.y;
return pos;
void BitReverseHorizontal (uint3 threadID : SV_GroupThreadID, uint3 groupID : SV_GroupID)
uint2 pos = getTextureThreadPosition(groupID, threadID);
uint xPos = reversebits(pos.x);
uint2 revPos = uint2(xPos, pos.y);
float4 values;
values.x = inTex[pos].x;
values.y = inTex[pos].y;
values.z = inTex[revPos].z;
values.w = 0.0f;
outTex[revPos] = values;
I played around with this for quite a while and found out, that if I replace the reversebits line with this one here:
uint xPos = reversebits(pos.x << 23);
it works. Although I have no idea why. Could be just coincidence. Could someone please explain to me, how I have to use the reversebits function correctly?
Are you sure you want to reverse the bits?
x = 0: reversed: x = 0
x = 1: reversed: x = 2,147,483,648
x = 2: reversed: x = 1,073,741,824
If you fetch texels from a texture using coordinates exceeding the width of the texture then you're going to get black. Unless the texture is > 1 billion texels wide (it isn't) then you're fetching well outside the border.
I am doing the same and came to the same problem and these answers actually answered it for me but i'll give you the explanation and a whole solution.
So the solution with variable length buffers in HLSL is:
uint reversedIndx;
uint bits = 32 - log2(xLen); // sizeof(uint) - log2(numberOfIndices);
for (uint j = 0; j < xLen; j ++)
reversedIndx = reversebits(j << bits);
And what you found/noticed essentially pushes out all the leading 0 of your index so you are just reversing the least significant or rightmost bits up until the max bits we want.
for example:
int length = 8;
int bits = 32 - 3; // because 1 << 3 is 0b1000 and we want the inverse like a mask
int j = 6;
and since the size of an int is generally 32bits in binary j would be
j = 0b00000000000000000000000000000110;
and reversed it would be (AKA reversebits(j);)
j = 0b01100000000000000000000000000000;
Which was our error, so j bit shifted by bits would be
j = 0b11000000000000000000000000000000;
and then reversed and what we want would be
j = 0b00000000000000000000000000000011;

Mouse movement angle in openFrameworks

I am currently in the process of creating a sort of drawing program in openFrameworks that needs to calculate the angle of mouse movement. The reason for this is that the program needs to be able to draw brush strokes similar to the way photoshop does it.
I've been able to get it to work in a very jaggy way. I've placed my code in the MouseDragged event in openFrameworks, but the calculated angle is extremely jaggy and not smooth in anyway. It needs to be smooth in order for the drawing part to look good.
void testApp::mouseMoved(int x, int y ){
dxX = x - oldX;
dxY = y - oldY;
movementAngle = (atan2(dxY, dxX) * 180.0 / PI);
double movementAngleRad;
movementAngleRad = movementAngle * TO_RADIANS;
if (movementAngle < 0) {
movementAngle += 360;
testString = "X: " + ofToString(dxX) + " ,";
testString += "Y: " + ofToString(dxY) + " ,";
testString += "movementAngle: " + ofToString(movementAngle);
oldX = x;
oldY = y;
I've tried different ways of optimizing the code to work smooth but alas without results.
If you sit with a brilliant idea on how this could be fixed or optimized, I will be very grateful.
I solved it to some degree by using an ofPolyline object.
The following code shows how it works.
void testApp::mouseMoved(int x, int y ){
float angleRad;
if (movement.size() > 4)
{ angleRad = atan2(movement[movement.size()-4].y - y, movement[movement.size()-4].x -x);}
movementAngle = (angleRad * 180 / PI) + 180;
As seen in the code I'm using the point recorded 4 steps back to increase the smoothness of the angle. This works if the mouse is moved in stroke like movements. If the mouse is moved slow, jaggyness will still occur.

Get orientation device in the iPhone for Opengl Es

I'm trying to convert the geomagnetic and accelerometer to rotate the camera in opengl ES1, I found some code from android and changed this code for iPhone, actually it is working more or less, but there are some mistakes, I´m not able to find this mistake, I put the code, also the call to Opengl Es1: glLoadMatrixf((GLfloat*)matrix);
- (void) GetAccelerometerMatrix:(GLfloat *) matrix headingX: (float)hx headingY:(float)hy headingZ:(float)hz;
_geomagnetic[0] = hx * (FILTERINGFACTOR-0.05) + _geomagnetic[0] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[3] * (0.55);
_geomagnetic[1] = hy * (FILTERINGFACTOR-0.05) + _geomagnetic[1] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[4] * (0.55);
_geomagnetic[2] = hz * (FILTERINGFACTOR-0.05) + _geomagnetic[2] * (1.0 - FILTERINGFACTOR-0.5)+ _geomagnetic[5] * (0.55);
_geomagnetic[3]=_geomagnetic[0] ;
//Clear matrix to be used to rotate from the current referential to one based on the gravity vector
bzero(matrix, sizeof(matrix));
float Ex = -_geomagnetic[1];
float Ey =_geomagnetic[0];
float Ez =_geomagnetic[2];
float Ax= -_accelerometer[0];
float Ay= _accelerometer[1] ;
float Az= _accelerometer[2] ;
float Hx = Ey*Az - Ez*Ay;
float Hy= Ez*Ax - Ex*Az;
float Hz = Ex*Ay - Ey*Ax;
float normH = (float)sqrt(Hx*Hx + Hy*Hy + Hz*Hz);
float invH = 1.0f / normH;
Hx *= invH;
Hy *= invH;
Hz *= invH;
float invA = 1.0f / (float)sqrt(Ax*Ax + Ay*Ay + Az*Az);
Ax *= invA;
Ay *= invA;
Az *= invA;
float Mx = Ay*Hz - Az*Hy;
float My = Az*Hx - Ax*Hz;
float Mz = Ax*Hy - Ay*Hx;
// if (mOut.f != null) {
matrix[0] = Hx; matrix[1] = Hy; matrix[2] = Hz; matrix[3] = 0;
matrix[4] = Mx; matrix[5] = My; matrix[6] = Mz; matrix[7] = 0;
matrix[8] = Ax; matrix[9] = Ay; matrix[10] = Az; matrix[11] = 0;
matrix[12] = 0; matrix[13] = 0; matrix[14] = 0; matrix[15] = 1;
Thank you very much for the help.
Edit: The iPhone it is permantly in landscape orientation and I know that something is wrong because the object painted in Opengl Es appears two times.
Have you looked at Apple's GLGravity sample code? It does something very similar to what you want here, by manipulating the model view matrix in response to changes in the accelerometer input.
I'm unable to find any problems with the code posted, and would suggest the problem is elsewhere. If it helps, my analysis of the code posted is that:
The first six lines, dealing with _geomagnetic 0–5, effect a very simple low frequency filter, which assumes you call the method at regular intervals. So you end up with a version of the magnetometer vector, hopefully with high frequency jitter removed.
The bzero zeroes the result, ready for accumulation.
The lines down to the declaration and assignment to Hz take the magnetometer and accelerometer vectors and perform the cross product. So H(x, y, z) is now a vector at right angles to both the accelerometer (which is presumed to be 'down') and the magnetometer (which will be forward + some up). Call that the side vector.
The invH and invA stuff, down to the multiplication of Az by invA ensure that the side and accelerometer/down vectors are of unit length.
M(x, y, z) is then created, as the cross product of the side and down vectors (ie, a vector at right angles to both of those). So it gives the front vector.
Finally, the three vectors are used to populate the matrix, taking advantage of the fact that the inverse of an orthonormal 3x3 matrix is its transpose (though that's sort of hidden by the way things are laid out — pay attention to the array indices). You actually set everything in the matrix directly, so the bzero wasn't necessary in pure outcome terms.
glLoadMatrixf is then the correct thing to use because that's how you multiply by an arbitrary column-major matrix in OpenGL ES 1.x.

Looking for some help working with premultiplied alpha

I am trying to update a source image with the contents of multiple destination images. From what I can tell using premultiplied alpha is the way to go with this, but I think I am doing something wrong (function below). the image I am starting with is initialized with all ARGB values set to 0. When I run the function once the resulting image looks great, but when I start compositing on any others all the pixels that have alpha information get really messed up. Does anyone know if I am doing something glaringly wrong or if there is something extra I need to do to modify the color values?
void CompositeImage(unsigned char *src, unsigned char *dest, int srcW, int srcH){
int w = srcW;
int h = srcH;
int px0;
int px1;
int px2;
int px3;
int inverseAlpha;
int r;
int g;
int b;
int a;
int y;
int x;
for (y = 0; y < h; y++) {
for (x= 0; x< w*4; x+=4) {
// pixel number
px0 = (y*w*4) + x;
px1 = (y*w*4) + (x+1);
px2 = (y*w*4) + (x+2);
px3 = (y*w*4) + (x+3);
inverseAlpha = 1 - src[px3];
// create new values
r = src[px0] + inverseAlpha * dest[px0];
g = src[px1] + inverseAlpha * dest[px1];
b = src[px2] + inverseAlpha * dest[px2];
a = src[px3] + inverseAlpha * dest[px3];
// update destination image
dest[px0] = r;
dest[px1] = g;
dest[px2] = b;
dest[px3] = a;
I'm not clear on what data you are working with. Do your source images already have the alpha values pre-multiplied as they are stored? If not, then pre-multiplied alpha does not apply here and you would need to do normal alpha blending.
Anyway, the big problem in your code is that you're not keeping track of the value ranges that you're dealing with.
inverseAlpha = 1 - src[px3];
This needs to be changed to:
inverseAlpha = 255 - src[px3];
You have all integral value types here, so the normal incoming 0..255 value range will result in an inverseAlpha range of -254..1, which will give you some truly wacky results.
After changing the 1 to 255, you also need to divide your results for each channel by 255 to scale them back down to the appropriate range. The alternative is to do the intermediate calculations using floats instead of integers and divide the initial channel values by 255.0 (instead of these other changes) to get values in the 0..1 range.
If your source data really does already have pre-multiplied alpha, then your result lines should look like this.
r = src[px0] + inverseAlpha * dest[px0] / 255;
If your source data does not have pre-multiplied alpha, then it should be:
r = src[px0] * src[px3] / 255 + inverseAlpha * dest[px0] / 255;
There's nothing special about blending the alpha channel. Use the same calculation as for r, g, and b.