Using C++ AMP with Direct2D - microsoft-metro

Is it possible to use a texture generated by C++ AMP as a screen buffer?
I would like to generate an image with my C++ AMP code (already done) and use this image to fill the entire screen of Windows 8 metro app. The image is updated 60 times per second.
I'm not at all fluent in Direct3D. I used Direct2d template app as a starting point.
First I tried to manipulate the buffer from swap chain in the C++ AMP code directly, but any attempt to write to that texture caused an error.
Processing data with AMP on GPU, then moving it to CPU memory to create a bitmap that I can use in D2D API seems way inefficient.
Can somebody share a piece of code that would allow me to manipulate swap chain buffer texture with C++ AMP directly (without data leaving the GPU) or at least populate that buffer with data from another texture that doesn't leave the GPU?

You can interop between an AMP Texture<> and a ID3D11Texture2D buffer. The complete code and other examples of interop can be found in the Chapter 11 samples here.
// Get a D3D texture resource from an AMP texture.
texture<int, 2> text(100, 100);
CComPtr<ID3D11Texture2D> texture;
IUnknown* unkRes = get_texture(text);
hr = unkRes->QueryInterface(__uuidof(ID3D11Texture2D),
// Create a texture from a D3D texture resource
const int height = 100;
const int width = 100;
ZeroMemory(&desc, sizeof(desc));
desc.Height = height;
desc.Width = width;
desc.MipLevels = 1;
desc.ArraySize = 1;
desc.Format = DXGI_FORMAT_R8G8B8A8_UINT;
desc.SampleDesc.Count = 1;
desc.SampleDesc.Quality = 0;
desc.Usage = D3D11_USAGE_DEFAULT;
desc.CPUAccessFlags = 0;
desc.MiscFlags = 0;
CComPtr<ID3D11Texture2D> dxTexture = nullptr;
hr = device->CreateTexture2D(&desc, nullptr, &dxTexture);
texture<uint4, 2> ampTexture = make_texture<uint4, 2>(dxView, dxTexture);


Bitmap from byte[] array

I want to create a bitmap from a byte[]. My problem is that I can't use a BitmapSource in Unity and if I use a MemoryStream Unity gets an error.
I tried it with this:
Bitmap bitmap = new Bitmap(512, 424);
var data = bitmap.LockBits(new Rectangle(Point.Empty, bitmap.Size),
ImageLockMode.WriteOnly, System.Drawing.Imaging.PixelFormat.Format32bppArgb);
Marshal.Copy(arrayData, 0, data.Scan0, arrayData.Length);
It works but the Bitmap I get is the wrong way up. Can someone explain me why and got a solution for me?
This can be two things, perhaps combined: The choice of coordinate system, and Endianness
There's a convention (I believe universal) to list pixels from left to right, but there's none regarding vertical orientation. While some programs and APIs have the Y-coordinate be zero at the bottom and increases upwards, others do the exact opposite. I don't know where you get the byte[] from, but some APIs allow you to configure the pixel orientation when writing, reading or using textures. Otherwise, you'll have to manually re-arrange the rows.
The same applies to endianness; ARGB sometimes means Blue is the last byte, sometimes the first.Some classes, like BitConverter have buit-in solutions too.
Unity uses big-endian, bottom-up textures. In fact, Unity handles lots of this stuff under the hood, and has to re-order rows and flip bytes when importing bitmap files. Unity also provides methods like LoadImage and EncodeToPNG that take care of both problems.
To illustrate what happens to the byte[], this sample code saves the same image in three different ways (but you need to import them as Truecolor to see them properly in Unity):
using UnityEngine;
using UnityEditor;
using System.Drawing;
using System.Drawing.Imaging;
public class CreateTexture2D : MonoBehaviour {
public void Start () {
int texWidth = 4, texHeight = 4;
// Raw 4x4 bitmap data, in bottom-up big-endian ARGB byte order. It's transparent black for the most part.
byte[] rawBitmap = new byte[] {
// Red corner (bottom-left) is written first
255,255,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
0,0,0,0, 0,0,0,0, 0,0,0,0, 0,0,0,0,
255,0,0,255, 255,0,0,255, 255,0,0,255, 255,0,0,255
//Blue border (top) is the last "row" of the array
// We create a Texture2D from the rawBitmap
Texture2D texture = new Texture2D(texWidth, texHeight, TextureFormat.ARGB32, false);
// 1.- We save it directly as a Unity asset (btw, this is useful if you only use it inside Unity)
UnityEditor.AssetDatabase.CreateAsset(texture, "Assets/TextureAsset.asset");
// 2.- We save the texture to a file, but letting Unity handle formatting
byte[] textureAsPNG = texture.EncodeToPNG();
System.IO.File.WriteAllBytes(Application.dataPath + "/EncodedByUnity.png", textureAsPNG);
// 3.- Rearrange the rawBitmap manually into a top-down small-endian ARGB byte order. Then write to a Bitmap, and save to disk.
// Bonus: This permutation is it's own inverse, so it works both ways.
byte[] rearrangedBM = new byte[rawBitmap.Length];
for (int row = 0; row < texHeight; row++)
for (int col = 0; col < texWidth; col++)
for (int i = 0; i < 4; i++)
rearrangedBM[row * 4 * texWidth + 4 * col + i] = rawBitmap[(texHeight - 1 - row) * 4 * texWidth + 4 * col + (3 - i)];
Bitmap bitmap = new Bitmap(texWidth, texHeight, PixelFormat.Format32bppArgb);
var data = bitmap.LockBits(new Rectangle(0, 0, texWidth, texHeight), ImageLockMode.WriteOnly, PixelFormat.Format32bppArgb);
System.Runtime.InteropServices.Marshal.Copy(rearrangedBM, 0, data.Scan0, rearrangedBM.Length);
bitmap.Save(Application.dataPath + "/SavedBitmap.png", ImageFormat.Png);

Help with live-updating sound on the iPhone

My question is a little tricky, and I'm not exactly experienced (I might get some terms wrong), so here goes.
I'm declaring an instance of an object called "Singer". The instance is called "singer1". "singer1" produces an audio signal. Now, the following is the code where the specifics of the audio signal are determined:
OSStatus playbackCallback(void *inRefCon,
AudioUnitRenderActionFlags *ioActionFlags,
const AudioTimeStamp *inTimeStamp,
UInt32 inBusNumber,
UInt32 inNumberFrames,
AudioBufferList *ioData) {
//Singer *me = (Singer *)inRefCon;
static int phase = 0;
for(UInt32 i = 0; i < ioData->mNumberBuffers; i++) {
int samples = ioData->mBuffers[i].mDataByteSize / sizeof(SInt16);
SInt16 values[samples];
float waves;
float volume=.5;
for(int j = 0; j < samples; j++) {
waves = 0;
waves += sin(kWaveform * 600 * phase)*volume;
waves += sin(kWaveform * 400 * phase)*volume;
waves += sin(kWaveform * 200 * phase)*volume;
waves += sin(kWaveform * 100 * phase)*volume;
waves *= 32500 / 4; // <--------- make sure to divide by how many waves you're stacking
values[j] = (SInt16)waves;
values[j] += values[j]<<16;
memcpy(ioData->mBuffers[i].mData, values, samples * sizeof(SInt16));
return noErr;
99% of this is borrowed code, so I only have a basic understanding of how it works (I don't know about the OSStatus class or method or whatever this is. However, you see those 4 lines with 600, 400, 200 and 100 in them? Those determine the frequency. Now, what I want to do (for now) is insert my own variable in there in place of a constant, which I can change on a whim. This variable is called "fr1". "fr1" is declared in the header file, but if I try to compile I get an error about "fr1" being undeclared. Currently, my technique to fix this is the following: right beneath where I #import stuff, I add the line
fr1=0.0;//any number will work properly
This sort of works, as the code will compile and singer1.fr1 will actually change values if I tell it to. The problems are now this:A)even though this compiles and the tone specified will play (0.0 is no tone), I get the warnings "Data definition has no type or storage class" and "Type defaults to 'int' in declaration of 'fr1'". I bet this is because for some reason it's not seeing my previous declaration in the header file (as a float). However, again, if I leave this line out the code won't compile because "fr1 is undeclared". B)Just because I change the value of fr1 doesn't mean that singer1 will update the value stored inside the "playbackcallback" variable or whatever is in charge of updating the output buffers. Perhaps this can be fixed by coding differently? C)even if this did work, there is still a noticeable "gap" when pausing/playing the audio, which I need to eliminate. This might mean a complete overhaul of the code so that I can "dynamically" insert new values without disrupting anything. However, the reason I'm going through all this effort to post is because this method does exactly what I want (I can compute a value mathematically and it goes straight to the DAC, which means I can use it in the future to make triangle, square, etc waves easily). I have uploaded Singer.h and .m to pastebin for your veiwing pleasure, perhaps they will help. Sorry, I can't post 2 HTML tags so here are the full links.
So, TL;DR, all I really want to do is be able to define the current equation/value of the 4 waves and re-define them very often without a gap in the sound.
Thanks. (And sorry if the post was confusing or got off track, which I'm pretty sure it did.)
My understanding is that your callback function is called every time the buffer needs to be re-filled. So changing fr1..fr4 will alter the waveform, but only when the buffer updates. You shouldn't need to stop and re-start the sound to get a change, but you will notice an abrupt shift in the timbre if you change your fr values. In order to get a smooth transition in timbre, you'd have to implement something that smoothly changes the fr values over time. Tweaking the buffer size will give you some control over how responsive the sound is to your changing fr values.
Your issue with fr being undefined is due to your callback being a straight c function. Your fr variables are declared as objective-c instance variables as part of your Singer object. They are not accessible by default.
take a look at this project, and see how he implements access to his instance variables from within his callback. Basically he passes a reference to his instance to the callback function, and then accesses instance variables through that.
Sinewave *sineObject = inRefCon;
float freq = sineObject.frequency * 2 * M_PI / samplingRate;
AURenderCallbackStruct input;
input.inputProc = RenderCallback;
input.inputProcRefCon = self;
Also, you'll want to move your callback function outside of your #implementation block, because it's not actually part of your Singer object.
You can see this all in action here:

iPhone original render faster than iPhone 3GS?

I'm working on a game on iPhone, which uses C++ and OpenGL ES 1.x library.
It works fine on simulator. But when I install it on real iPhone, I found out that on iPhone original, it took about 20 milliseconds to render a frame. However, it took 35~40 milliseconds to render a frame on iPhone 3GS.
I've tried various OS, including 3GS + iOS 3.1.2, 3G + iOS 4.0, 3GS + iOS 4.1, iPad + iOS 3.2. All of them render much slower than iPhone original, which sounds really ridiculous to me. I tried google for anything I can think of, fixing every problem it might be related to, but nothing changed.
I have 2 machine which these pieces of code render faster: 1) iPhone original with iOS 3.1.3, 2) iPod Touch with iOS 3.1.3. Both took about 20 milliseconds to render a frame.
And 4 machine which render mysteriously slower: 1) iPhone 3G with iOS 4.0, 2) iPhone 3GS with iOS 3.1.2, 3) iPhone 3GS with iOS 4.1, 4) iPad with iOS 3.2. iPhone took about 35-40 milliseconds to render a frame and iPad took around 25.
I use PVRTC for texture, which is first cooked and make into a bundle. It uses total of ten 512x512 textures, three 1024x1024 textures.
The piece of code which binding texture is as follow:
GLenum internalFormat = 0;
GLenum pixelType = 0;
// resolve type
assert(2==attr.Dimension && 1==attr.Depth);
switch (attr.Format)
if (attr.AlphaBits>0)
if (attr.AlphaBits>0)
... other formats ...
// prepare temp buffer to load
MemoryBuffer tmpBuffer(true);
uint8* buffer = tmpBuffer.GetWritePtr(attr.TextureSize);
// read data
stream.Read(buffer, attr.TextureSize);
if (stream.Fail())
return false;
// init
width_ = attr.Width;
height_ = attr.Height;
LODs_ = attr.LODs;
alphaBits_ = attr.AlphaBits;
// create and upload texture
glGenTextures(1, &glTexture_);
glBindTexture(GL_TEXTURE_2D, glTexture_);
uint32 offset = 0;
uint32 dim = width_; // = height
uint32 w, h;
switch (internalFormat)
for (uint32 i=0; i<LODs_; ++i) {
w = dim >> ((FORMAT_PVRTC2==attr.Format) ? 3:2);
h = dim >> 2;
// Clamp to minimum number of blocks
if (w<2) w = 2;
if (h<2) h = 2;
uint32 const image_size = w * h * 8; // 8 bytes for each block
glCompressedTexImage2D(GL_TEXTURE_2D, i, internalFormat, dim, dim, 0, image_size, buffer+offset);
dim >>= 1;
offset += image_size;
... other formats ...
return true;
Rendering part is huge because it uses an engine developed by others. As far as I can tell, it uses glDrawArrays and no shader was used.
Anyone had encounter the same problem before? I really can't see why iPhone original render much faster than iPhone 3GS.
p.s. I forgot to say. I draw only 2D rectangles with textures only. And it's around 20 rectangles in my game ( one background and one UI with 480x360 size. Others are commonly 64x64 units.)
The behaviour you are getting could be because of the possible emulation of Fixed Function Pipeline (FFP) via Programmable Pipeline (i.e. shaders).
Can you please execute a test that will load and display your textures in some way, completely without your engine.

EXC_BAD_ACCESS when calling avcodec_encode_video

I have an Objective-C class (although I don't believe this is anything Obj-C specific) that I am using to write a video out to disk from a series of CGImages. (The code I am using at the top to get the pixel data comes right from Apple: I successfully create the codec and context - everything is going fine until it gets to avcodec_encode_video, when I get EXC_BAD_ACCESS. I think this should be a simple fix, but I just can't figure out where I am going wrong.
I took out some error checking for succinctness. 'c' is an AVCodecContext*, which is created successfully.
CFDataRef bitmapData = CGDataProviderCopyData(CGImageGetDataProvider(img));
long dataLength = CFDataGetLength(bitmapData);
uint8_t* picture_buff = (uint8_t*)malloc(dataLength);
CFDataGetBytes(bitmapData, CFRangeMake(0, dataLength), picture_buff);
AVFrame *picture = avcodec_alloc_frame();
avpicture_fill((AVPicture*)picture, picture_buff, c->pix_fmt, c->width, c->height);
int outbuf_size = avpicture_get_size(c->pix_fmt, c->width, c->height);
uint8_t *outbuf = (uint8_t*)av_malloc(outbuf_size);
out_size = avcodec_encode_video(c, outbuf, outbuf_size, picture); // ERROR occurs here
printf("encoding frame %3d (size=%5d)\n", i, out_size);
fwrite(outbuf, 1, out_size, f);
I have stepped through it dozens of times. Here are some numbers...
dataLength = 408960
picture_buff = 0x5c85000
picture->data[0] = 0x5c85000 -- which I take to mean that avpicture_fill worked...
outbuf_size = 408960
and then I get EXC_BAD_ACCESS at avcodec_encode_video. Not sure if it's relevant, but most of this code comes from api-example.c. I am using XCode, compiling for armv6/armv7 on Snow Leopard.
Thanks so much in advance for help!
I have not enough information here to point to the exact error, but I think that the problem is that the input picture contains less data than avcodec_encode_video() expects:
avpicture_fill() only sets some pointers and numeric values in the AVFrame structure. It does not copy anything, and does not check whether the buffer is large enough (and it cannot, since the buffer size is not passed to it). It does something like this (copied from ffmpeg source):
size = picture->linesize[0] * height;
picture->data[0] = ptr;
picture->data[1] = picture->data[0] + size;
picture->data[2] = picture->data[1] + size2;
picture->data[3] = picture->data[1] + size2 + size2;
Note that the width and height is passed from the variable "c" (the AVCodecContext, I assume), so it may be larger than the actual size of the input frame.
It is also possible that the width/height is good, but the pixel format of the input frame is different from what is passed to avpicture_fill(). (note that the pixel format also comes from the AVCodecContext, which may differ from the input). For example, if c->pix_fmt is RGBA and the input buffer is in YUV420 format (or, more likely for iPhone, a biplanar YCbCr), then the size of the input buffer is width*height*1.5, but avpicture_fill() expects the size of width*height*4.
So checking the input/output geometry and pixel formats should lead you to the cause of the error. If it does not help, I suggest that you should try to compile for i386 first. It is tricky to compile FFMPEG for the iPhone properly.
Does the codec you are encoding support the RGB color space? You may need to use libswscale to convert to I420 before encoding. What codec are you using? Can you post the code where you initialize your codec context?
The function RGBtoYUV420P may help you.

Creating and loading .pngs in RGBA4444 RGBA5551 for openGL

I'm creating an openGL game and so far I have been using .pngs in the RGBA8888 format as texture sheets, but those are too memory hungry, and my app crashes frequently. I read in Apple's site that such format such be used just when too much quality is needed, and recommends to use RGBA4444 and RGBA5551 instead ( I already converted my textures to PVR but the quality loss is too great in most of the sprite sheets).
I only need to use GL_UNSIGNED_SHORT_5_5_5_1 or GL_UNSIGNED_SHORT_4_4_4_4 in my glTexImage2D call inside my texture loader class in order to load my textures, but I need to convert my texture sheets to RGBA4444 and RGBA5551, and I'm clueless about how could I achieve this.
Seriously? There are libraries to do this kind of conversion. But frankly, this is a bit of bit twiddling. There are libraries that use asm, or specialized SSE commands to accellerate this which will be fast, but its pretty easy to roll your own format converter in C/C++.
Your basic process would be:
Given a buffer of RGBA8888 encoded values
Create a buffer big enough to hold the RGBA4444 or RGBA5551 values. In this case, its simple - half the size.
Loop over the source buffer, unpacking each component, and repacking into the destination format, and write it into the destination buffer.
void* rgba8888_to_rgba4444(
void* src, // IN, pointer to source buffer
int cb) // IN size of source buffer, in bytes
// this code assumes that a long is 4 bytes and short is 2.
//on some compilers this isnt true
int i;
// compute the actual number of pixel elements in the buffer.
int cpel = cb/4;
unsigned long* psrc = (unsigned long*)src;
// create the RGBA4444 buffer
unsigned short* pdst = (unsigned short*)malloc(cpel*2);
// convert every pixel
for(i=0;i<cpel; i++)
// read a source pixel
unsigned pel = psrc[i];
// unpack the source data as 8 bit values
unsigned r = p & 0xff;
unsigned g = (pel >> 8) & 0xff;
unsigned b = (pel >> 16) & 0xff;
unsigned a = (pel >> 24) & 0xff;
//convert to 4 bit vales
r >>= 4;
g >>= 4;
b >>= 4;
a >>= 4;
// and store
pdst[i] = r | g << 4 | b << 8 | a << 12;
return pdst;
The actual conversion loop I did very wastefully, the components can be extracted, converted and repacked in a single pass, making for far faster code. I did it this way to make the conversion explicit, and easy to change. Also, im not sure that I got the component order the right way around. So it might be b, r, g, a, but it shouldn't effect the result of the function as it repackes in the same order into the dest buffer.
Using ImageMagick you can create RGBA4444 PNG files by running:
convert source.png -depth 4 destination.png
You can get ImageMagick from MacPorts.
You may consider using Imagination's PVRTexTool for Windows. It's specifically for creating PVR textures in every supported color format. It can create both PVRTC compressed textures (what you call "PVR") as well as uncompressed textures in 8888, 5551, 4444, etc.
However, it doesn't output PNGs (only PVRs) so your loading code would have change. Also, sometimes PVRs are much larger than PNGs because the pixels in PNGs are compressed with deflate compression.
Since you're most likely running OS X, you can use Darwine (now WineBottler) to run it (and other windows programs) on OS X.
You'll need to register as an Imagination developer before you can download PVRTexTool. Registration and the tool are both free.
Once you set it up, it's pretty painless and it gives you a decent GUI for working with PVRs.
You might also want to look how to optimize RGBA8888 for conversion to RGBA4444 using floyd-steinberg dithering in GIMP:
You could also use for conversion.
Here is an optimized in-place conversion of Chris' code which should run 2x as fast but is not as strait forward. The in-place conversion helps to avoid crashes by lowering the memory spike. Just thought I'd share in case anyone was planning on using this code. I've tested it and it works great:
void* rgba8888_to_rgba4444( void* src, // IN, pointer to source buffer
int cb) // IN size of source buffer, in bytes
int i;
// compute the actual number of pixel elements in the buffer.
int cpel = cb/4;
unsigned long* psrc = (unsigned long*)src;
unsigned short* pdst = (unsigned short*)src;
// convert every pixel
for(i=0;i<cpel; i++)
// read a source pixel
unsigned pel = psrc[i];
// unpack the source data as 8 bit values
unsigned r = (pel << 8) & 0xf000;
unsigned g = (pel >> 4) & 0x0f00;
unsigned b = (pel >> 16) & 0x00f0;
unsigned a = (pel >> 28) & 0x000f;
// and store
pdst[i] = r | g | b | a;
return pdst;