Is using Vertex Buffer Object's for very dynamic data a good idea performance-wise? - iphone

I have many particles who's vertices change every frame. The vertices are currently being drawn using a vertex array in 'client' memory. What performance characteristics can I expect if I use a vertex buffer object?
Since I have to use a number of glBuffersubData's to update the particle vertices, I am therefore transferring the vertices to video memory every frame anyway right(like i would if i use a regular vertex array)? Is there any benefit to VBO's in this case?
This is for iOS devices. The actual draw call: glDrawElements(GL_POINTS,num_particles,GL_UNSIGNED_SHORT,pindices);
Should I use GL_STREAM_DRAW or GL_DYNAMIC_DRAW?

Apple's documentation appears to recommend VBOs in all situations. If you're using ES 2.x then the GL_STREAM_DRAW vertex buffer type is explicitly for "when your application needs to create transient geometry that is rendered a small number of times and then discarded. This is most useful when your application must dynamically change vertex data every frame in a way that cannot be performed in a vertex shader." Use of glBufferSubData is then directly advocated.
Logically, I guess the only difference between supplying the data completely afresh and sending it to an existing GL_STREAM_DRAW or GL_DYNAMIC_DRAW buffer is that your space in the memory map (GPU or CPU, depending on the chip — MBXs don't really do VBOs but Apple supports them for other performance reasons) can be allocated once rather than allocated and released every frame.
Using the alignment and packing tips given in that document is likely to give a better improvement than a switch to VBOs, since otherwise the CPU just has to unpack and repack data upon glDrawElements. Though quite probably you're already aware of that and I appreciate that it isn't directly part of the question — I mainly throw it in as a comparative guess about performance benefits.

By setting VBOs properly, you are using optimal way of transferring data to the GPU. By doing so, you might skip some driver processing. The only way to see how much you get of improvement you get is to measure. It is different from card to card.
For VBO how-to, see this : VBO tutorial
EDIT
Forgot to answer the question : yes, it is a good idea. But first measure.

Related

tensorflow store training data on GPU memory

I am pretty new to tensorflow. I used to use theano for deep learning development. I notice a difference between these two, that is where input data can be stored.
In Theano, it supports shared variable to store input data on GPU memory to reduce the data transfer between CPU and GPU.
In tensorflow, we need to feed data into placeholder, and the data can come from CPU memory or files.
My question is: is it possible to store input data on GPU memory for tensorflow? or does it already do it in some magic way?
Thanks.
If your data fits on the GPU, you can load it into a constant on GPU from e.g. a numpy array:
with tf.device('/gpu:0'):
tensorflow_dataset = tf.constant(numpy_dataset)
One way to extract minibatches would be to slice that array at each step instead of feeding it using tf.slice:
batch = tf.slice(tensorflow_dataset, [index, 0], [batch_size, -1])
There are many possible variations around that theme, including using queues to prefetch the data to GPU dynamically.
It is possible, as has been indicated, but make sure that it is actually useful before devoting too much effort to it. At least at present, not every operation has GPU support, and the list of operations without such support includes some common batching and shuffling operations. There may be no advantage to putting your data on GPU if the first stage of processing is to move it to CPU.
Before trying to refactor code to use on-GPU storage, try at least one of the following:
1) Start your session with device placement logging to log which ops are executed on which devices:
config = tf.ConfigProto(log_device_placement=True)
sess = tf.Session(config=config)
2) Try to manually place your graph on GPU by putting its definition in a with tf.device('/gpu:0'): block. This will throw exceptions if ops are not GPU-supported.

Streaming huge 3D scenes over internet

I want to stream big scenes made of many objects to clients but need some advice on what approach to take. I know PS4 and Battle.NET stream the games even when 70% of the game is not downloaded yet but they work pretty fast with my 18 Mbps connection.
Anyone can please help me where to start and how to start for streaming big scenes?
A lot of these don't necessarily stream huge scenes per se, if "huge scenes" implies transmitting the lowest-level primitive data (individual points, triangles, unique textures on every single object, etc).
They often stream higher-level data like "maps" with a lot of instanced data. For example, they might not transmit the triangles of a thousand trees in a forest. Instead, they might transmit one unique tree asset which is instanced and just scaled and rotated and positioned differently to form a forest (just a unique transformation matrix per tree instance). The result might be that the entire forest can be transmitted without taking much more memory than a single tree's worth of triangles.
They might have two or more characters meshes which have identical geometry or topology and just unique deformations (point positions) or textures ("skins"), significantly reducing the amount of unique data that has to be sent/stored.
When doing this kind of instancing/tiling stuff, what might otherwise be terabytes worth of unique data may fit into megabytes due to the amount of instanced, non-unique data.
So the first step to doing this typically is to build your own level/map editor, e.g. That level/map editor can often serialize something considerably higher-level and tighter than, say, a Wavefront OBJ file due to the sheer amount of tiled/instanced (shared) data. That high-level data ends up being what you stream.
Second is to build scalable servers, and that's a separate beast. To do that often involves very efficient multithreading at the heart of the OS/kernel to achieve very efficient async I/O. There are some great resources out there on this subject, but it's too broad to cover in one simple answer.
And third might be compression of the data to further reduce the required bandwidth.
A commercial game title might seek all three of these, but probably the first thing to realize is that they're not necessarily streaming unique triangles and texels everywhere -- to stream such low-level data would place tremendous strain on the server, especially given the kind of player load that MMOs are designed to handle. There's a whole lot of instanced data that these games, especially MMOs, often use to significantly cut down on the unique data that actually has to occupy memory and be transmitted separately.
Maps and assets are often designed to carefully reuse existing data as much as possible -- carefully made to have maximum repetition to reduce memory requirements but without looking too blatantly redundant (variation vs. economy). They look "huge" but aren't really from a data standpoint given the sheer amount of repetition of the same data, and considering that they don't redundantly store repetitive data. They're typically very, very economical about it.
As far as streaming goes, a simple way might be to break the world down into 2-dimensional regions (with some overlap to allow a seamless experience so that adjacent regions are being streamed as the player travels around the world) with AABBs around them. Stream the data for the region(s) the player is in and possibly visible within the viewing frustum. It can get a lot more elaborate than this but this might serve as a decent starting point.

Scala streaming peak detection with reactive events

I am trying to work out the best way to structure an application that in essence is a peak detection program. In my line of work I have been given charge of developing a system that essentially is looking at pulses in a stream of data and doing calculations on the peak data.
At the moment the software is implemented in LabVIEW. I'm sure many of you on here would understand why I'd love to see the end of that environment. I would like to redesign this in Scala (and possibly use Play if I was to make it use a web frontend) but I am not sure how best to approach the initial peak-detection component.
I've seen many tutorials for peak detection in various languages and I understand from a theoretical perspective many of the algorithms. What I am not sure is how would I approach this from the most Scala/Play idiomatic way?
Obviously I don't expect someone to write the code for me but I would really appreciate any pointers as to the direction I should take that makes the most sense. Since I cannot be too specific on the use case I'll try to give an overview of what I'm trying to do below:
Interfacing with data acquisition hardware to send out control voltages and read back "streams" of data.
I should be able to work the hardware side out, but is there a specific structure that would be best for the returned stream? I don't necessarily know ahead of time how much data I'll be reading so a stream that can be buffered and chunked would probably be appropriate.
Scan through the stream to find peaks and measure their height and trigger an event.
Peaks are usually about 20 samples wide or so but that depends on sample rate so I don't want to hard-code anything like that. I assume a sliding window would be necessary so peaks don't get "cut off" on the edge of a buffer. As a peak arrives I need to record and act on it. I think reactive streams and so on may be appropriate but I'm not sure. I will be making live graphs etc with the data so however it is done I need a way to send an event immediately on a successful detection.
The streams can be quite long and are at high sample-rates (minimum of 250ksamples per second) so I'd prefer not to have to buffer the entire stream to memory. The only information that needs to be permanent is the peak voltage data. I will need a way to visualise the raw stream for calibration purposes but I imagine that should be pretty simple.
The full application is much more complex and I'll need to do some initial filtering of noise and drift but I believe I should be able to work that out once I know what kind of implementation I should build on.
I've tried to look into Play's Iteratees and such but they are a little hard to follow. If they are an appropriate fit then I'm happy to work on learning them but since I'm not sure if that is the best way to approach the problem I'd love to know where I should look.
Reactive frameworks and the like certainly look interesting and I can see how I could really easily build the rest of the application around them but I'm just not sure how best to implement a streaming peak detection function on top of them beyond something simple like triggering when a value is over a threshold (as mentioned previously a "peak" can be quite wide and the signal is noisy).
Any advice would be greatly appreciated!
This is not a solution to this question but I'm writing this as an answer because of space/formatting limitations in the comments section.
Since you are exploring options I would suggest the following:
Assuming you have a large enough buffer to keep a window of data in memory (W=tXw) you can calculate the peak for the buffer using your existing algorithm. Next you can collect the next few samples data in a delta buffer (d) (a much smaller window). The delta buffer is the size of your increment. Assuming this is time series data you can easily create the new sliding window by removing the first delta (dXt) values from the buffer W and adding d values to the buffer. This is how Spark-streaming implements reduceByWindow function on a DStream. Iteratee can also help here.
If your system is distributed then you can use stream processing systems (Storm, Spark-streaming) to get better latency and throughput at the cost of distributing the system.
If you are really resource constrained and can live approximate results that bounded I would suggest you look at implementing a combination of probabilistic data structures such as count-min-sketch, hyperloglog and bloom filter.

Processing accelerometer data

I would like to know if there are some libraries/algorithms/techniques that help to extract the user context (walking/standing) from accelerometer data (extracted from any smartphone)?
For example, I would collect accelerometer data every 5 seconds for a definite period of time and then identify the user context (ex. for the first 5 minutes, the user was walking, then the user was standing for a minute, and then he continued walking for another 3 minutes).
Thank you very much in advance :)
Check new activity recognization apis
http://developer.android.com/google/play-services/location.html
its still a research topic,please look at this paper which discuss the algorithm
http://www.enggjournals.com/ijcse/doc/IJCSE12-04-05-266.pdf
I don't know of any such library.
It is a very time consuming task to write such a library. Basically, you would build a database of "user context" that you wish to recognize.
Then you collect data and compare it to those in the database. As for how to compare, see Store orientation to an array - and compare, the same holds for accelerometer.
Walking/running data is analogous to heart-rate data in a lot of ways. In terms of getting the noise filtered and getting smooth peaks, look into noise filtering and peak detection algorithms. The following is used to obtain heart-rate information for heart patients, it should be a good starting point : http://www.docstoc.com/docs/22491202/Pan-Tompkins-algorithm-algorithm-to-detect-QRS-complex-in-ECG
Think about how you want to filter out the noise and detect peaks; the filters will obviously depend on the raw data you gather, but it's good to have a general idea of what kind of filtering you'd want to do on your data. Think about what needs to be done once you have filtered data. In your case, think about how you would go about designing an algorithm to find out when the data indicates activity (like walking, running,etc.), and when it shows the user being stationary. This is a fairly challenging problem to solve, once you consider the dynamics of the device itself (how it's positioned when the user is walking/running), and the fact that there are very few (if not no) benchmarked algos that do this with raw smartphone data.
Start with determining the appropriate algorithms, and then tackle the complexities (mentioned above) one by one.

trainning neural network

I have a picture.1200*1175 pixel.I want to train a net(mlp or hopfield) to learn a specific part of it(201*111pixel) to save its weight to use in a new net(with the same previous feature)only without train it to find that specific part.now there are this questions :what kind of nets is useful;mlp or hopfield,if mlp;the number of hidden layers;the trainlm function is unuseful because "out of memory" error.I convert the picture to a binary image,is it useful?
What exactly do you need the solution to do? Find an object with an image (like "Where's Waldo"?). Will the target object always be the same size and orientation? Might it look different because of lighting changes?
If you just need to find a fixed pattern of pixels within a larger image, I suggest using a straightforward correlation measure, such as crosscorrelation to find it efficiently.
If you need to contend with any of the issues mentioned above, then there are two basic solutions: 1. Build a model using examples of the object in different poses, scalings, etc. so that the model will recognize any of them, or 2. Develop a way to normalize the patch of pixels being examined, to minimize the effect of those distortions (like Hu's invariant moments). If nothing else, yuo'll want to perform some sort of data reduction to get the number of inputs down. Technically, you could also try a model which is invariant to rotations, etc., but I don't know how well those work. I suspect that they are more tempermental than traditional approaches.
I found AdaBoost to be helpful in picking out only important bits of an image. That, and resizing the image to something very tiny (like 40x30) using a Gaussian filter will speed it up and put weight on more of an area of the photo rather than on a tiny insignificant pixel.