Streaming huge 3D scenes over internet - streaming

I want to stream big scenes made of many objects to clients but need some advice on what approach to take. I know PS4 and Battle.NET stream the games even when 70% of the game is not downloaded yet but they work pretty fast with my 18 Mbps connection.
Anyone can please help me where to start and how to start for streaming big scenes?

A lot of these don't necessarily stream huge scenes per se, if "huge scenes" implies transmitting the lowest-level primitive data (individual points, triangles, unique textures on every single object, etc).
They often stream higher-level data like "maps" with a lot of instanced data. For example, they might not transmit the triangles of a thousand trees in a forest. Instead, they might transmit one unique tree asset which is instanced and just scaled and rotated and positioned differently to form a forest (just a unique transformation matrix per tree instance). The result might be that the entire forest can be transmitted without taking much more memory than a single tree's worth of triangles.
They might have two or more characters meshes which have identical geometry or topology and just unique deformations (point positions) or textures ("skins"), significantly reducing the amount of unique data that has to be sent/stored.
When doing this kind of instancing/tiling stuff, what might otherwise be terabytes worth of unique data may fit into megabytes due to the amount of instanced, non-unique data.
So the first step to doing this typically is to build your own level/map editor, e.g. That level/map editor can often serialize something considerably higher-level and tighter than, say, a Wavefront OBJ file due to the sheer amount of tiled/instanced (shared) data. That high-level data ends up being what you stream.
Second is to build scalable servers, and that's a separate beast. To do that often involves very efficient multithreading at the heart of the OS/kernel to achieve very efficient async I/O. There are some great resources out there on this subject, but it's too broad to cover in one simple answer.
And third might be compression of the data to further reduce the required bandwidth.
A commercial game title might seek all three of these, but probably the first thing to realize is that they're not necessarily streaming unique triangles and texels everywhere -- to stream such low-level data would place tremendous strain on the server, especially given the kind of player load that MMOs are designed to handle. There's a whole lot of instanced data that these games, especially MMOs, often use to significantly cut down on the unique data that actually has to occupy memory and be transmitted separately.
Maps and assets are often designed to carefully reuse existing data as much as possible -- carefully made to have maximum repetition to reduce memory requirements but without looking too blatantly redundant (variation vs. economy). They look "huge" but aren't really from a data standpoint given the sheer amount of repetition of the same data, and considering that they don't redundantly store repetitive data. They're typically very, very economical about it.
As far as streaming goes, a simple way might be to break the world down into 2-dimensional regions (with some overlap to allow a seamless experience so that adjacent regions are being streamed as the player travels around the world) with AABBs around them. Stream the data for the region(s) the player is in and possibly visible within the viewing frustum. It can get a lot more elaborate than this but this might serve as a decent starting point.

Related

What to do with not enough training data?

I have a problem that I don't have enough training data for my NN. It is trying to predict the result of a soccer game given the last games which I woulf say is a regression task.
The training data are results of soccer games of the last 15 seasons (which are about 4500 games). Getting to new data would be hard and would take a lot of time.
What should I do now?
Is it good to duplicate the data?
Should I input randomized data? (Maybe noise but I'm not quite sure what that is)
If there is no way of creating more data,
I should probably turn up the learning rate right? (I have it sitting at 0.01 and the momentum at 0.9)
I am using mini batches consisting of 32 training datas in training. Since I don't have a lot of training I don't have a lot of mini batches. Should I stop using them?
To start from the beginning: This is a very theoretical question and is not directly related to programming, which I recommend (in future) to post over at the Data Science Stackexchange.
To go into your problem: 4500 samples is not as bad as it sounds, depending on the exact task at hand. Are you trying to predict the match results (i.e. which team is the winner?), are you looking for more specific predictions (across a lot of different, specific teams)?
If you can make sure that you have a reasonable amount of data per class, one can work with a number of samples lower than what you have. Simply duplicating the data will not help you much, since you are very likely to just overfit on the samples you are seeing, without much of an improvement; Or rather, you will get the same results as training over a longer period (since essentially you see every sample twice per epoch, instead of one).
Again, what usually happens after long training periods is overfitting, so nothing gained here.
Your second suggestion is generally called data augmentation. Instead of simply copying samples, you alter them enough to make it look "different" to the network. But be careful! Data augmentation works well for some inputs, like images, since the change in input is significant enough to not represent the same sample, but still contains meaningful information about the class (a horizontally mirrored image of a cat still shows a "valid cat", unlike a vertically mirrored image, which is more unrealistic in the real world).
Essentially, it depends on your input features to determine where it makes sense to add noise. If you are only changing the results of the previous game, a minor change in input (adding/subtracting one goal at random) can significantly change the prediction you make.
If you slightly scramble ELO scores by a random number, on the other hand, the input value will not be too different, "but different enough" to use it as a novel example.
Turning up the learning rate is not a good idea, since you are essentially just letting the network converge more towards the specific samples. On the contrary, I would argue that the current learning rate is still too high, and you should certainly not increase it.
Regarding mini batches, I think I have referenced this a million times now, but always consider smaller minibatches. From a theoretical point of view, you are more likely to converge to a local minimum.

compare the way of resolving collisions in hash tables

how can I compare methods of conflict resolution (ie. linear hashing, square hashing and double hashing) in the tables hash? What data would be best to show the differences between them? Maybe someone has seen such comparisons.
There is no simple approach that's also universally meaningful.
That said, a good approach if you're tuning an actual app is to instrument (collect stats) for the hash table implementation you're using in the actual application of interest, with the real data it processes, and for whichever functions are of interest (insert, erase, find etc.). When those functions are called, record whatever you want to know about the collisions that happen: depending on how thorough you want to be, that might include the number of collisions before the element was inserted or found, the number of CPU/memory cache lines touched during that probing, the elapsed CPU or wall-clock time etc..
If you want a more general impression, instrument an implementation and throw large quantities of random data at it - but be aware that the real-world applicability of whatever conclusions you draw may only be as good as the random data is similar to the real-world data.
There are also other, more subtle implications to the choice of collision-handling mechanism: linear probing allows an implementation to cleanup "tombstone" buckets where deleted elements exist, which takes time but speeds later performance, so the mix of deletions amongst other operations can affect the stats you collect.
At the other extreme, you could try a mathematical comparison of the properties of different collision handling - that's way beyond what I'm able or interested in covering here.

Scala streaming peak detection with reactive events

I am trying to work out the best way to structure an application that in essence is a peak detection program. In my line of work I have been given charge of developing a system that essentially is looking at pulses in a stream of data and doing calculations on the peak data.
At the moment the software is implemented in LabVIEW. I'm sure many of you on here would understand why I'd love to see the end of that environment. I would like to redesign this in Scala (and possibly use Play if I was to make it use a web frontend) but I am not sure how best to approach the initial peak-detection component.
I've seen many tutorials for peak detection in various languages and I understand from a theoretical perspective many of the algorithms. What I am not sure is how would I approach this from the most Scala/Play idiomatic way?
Obviously I don't expect someone to write the code for me but I would really appreciate any pointers as to the direction I should take that makes the most sense. Since I cannot be too specific on the use case I'll try to give an overview of what I'm trying to do below:
Interfacing with data acquisition hardware to send out control voltages and read back "streams" of data.
I should be able to work the hardware side out, but is there a specific structure that would be best for the returned stream? I don't necessarily know ahead of time how much data I'll be reading so a stream that can be buffered and chunked would probably be appropriate.
Scan through the stream to find peaks and measure their height and trigger an event.
Peaks are usually about 20 samples wide or so but that depends on sample rate so I don't want to hard-code anything like that. I assume a sliding window would be necessary so peaks don't get "cut off" on the edge of a buffer. As a peak arrives I need to record and act on it. I think reactive streams and so on may be appropriate but I'm not sure. I will be making live graphs etc with the data so however it is done I need a way to send an event immediately on a successful detection.
The streams can be quite long and are at high sample-rates (minimum of 250ksamples per second) so I'd prefer not to have to buffer the entire stream to memory. The only information that needs to be permanent is the peak voltage data. I will need a way to visualise the raw stream for calibration purposes but I imagine that should be pretty simple.
The full application is much more complex and I'll need to do some initial filtering of noise and drift but I believe I should be able to work that out once I know what kind of implementation I should build on.
I've tried to look into Play's Iteratees and such but they are a little hard to follow. If they are an appropriate fit then I'm happy to work on learning them but since I'm not sure if that is the best way to approach the problem I'd love to know where I should look.
Reactive frameworks and the like certainly look interesting and I can see how I could really easily build the rest of the application around them but I'm just not sure how best to implement a streaming peak detection function on top of them beyond something simple like triggering when a value is over a threshold (as mentioned previously a "peak" can be quite wide and the signal is noisy).
Any advice would be greatly appreciated!
This is not a solution to this question but I'm writing this as an answer because of space/formatting limitations in the comments section.
Since you are exploring options I would suggest the following:
Assuming you have a large enough buffer to keep a window of data in memory (W=tXw) you can calculate the peak for the buffer using your existing algorithm. Next you can collect the next few samples data in a delta buffer (d) (a much smaller window). The delta buffer is the size of your increment. Assuming this is time series data you can easily create the new sliding window by removing the first delta (dXt) values from the buffer W and adding d values to the buffer. This is how Spark-streaming implements reduceByWindow function on a DStream. Iteratee can also help here.
If your system is distributed then you can use stream processing systems (Storm, Spark-streaming) to get better latency and throughput at the cost of distributing the system.
If you are really resource constrained and can live approximate results that bounded I would suggest you look at implementing a combination of probabilistic data structures such as count-min-sketch, hyperloglog and bloom filter.

Estimating possible # of actors in Scala

How can I estimate the number of actors that a Scala program can handle?
For context, I'm contemplating what is essentially a neural net that will be creating and forgetting cells at a high rate. I'm contemplating making each cell an actor, but there will be millions of them. I'm trying to decide whether this design is worth pursuing, but can't estimate the limits of number of actors. My intent is that it should totally run on one system, so distributed limits don't apply.
For that matter, I haven't definitely settled on Scala, if there's some better choice, but the cells do have state, as in, e.g., their connections to other cells, the weights of the connections, etc. Though this COULD be done as "Each cell is final. Changes mean replacing the current cell with a new one bearing the same id#."
P.S.: I don't know Scala. I'm considering picking it up to do this project. I'm also considering lots of other alternatives, including Java, Object Pascal and Ada. But actors seem a better map to what I'm after than thread-pools (and Java can't handle enough threads to make a thread/cell design feasible.
P.S.: At all times, most of the actors will be quiescent, but there wil need to be a way of cycling through the entire collection of them. If there isn't one built into the language, then this can be managed via first/next links within each cell. (Both links are needed, to allow cells in the middle to be extracted for release.)
With a neural net simulation, the real question is how much of the computational effort will be spent communicating, and how much will be spent computing something within a cell? If most of the effort is in communication then actors are perhaps a good choice for correctness, but not a good choice at all for efficiency (even with Akka, which performs reasonably well; AsyncFP might do the trick, though). Millions of neurons sounds slow--efficiency is probably a significant concern. If the neurons have some pretty heavy-duty computations to do themselves, then the communications overhead is no big deal.
If communications is the bottleneck, and you have lots of tiny messages, then you should design a custom data structure to hold the network, and also custom thread-handling that will take advantage of all the processors you have and minimize the amount of locking that you must do. For example, if you have space, each neuron could hold an array of input values from those neurons linked to it, and it would when calculating its output just read that array directly with no locking and the input neurons would just update the values also with no locking as they went. You can then just dump all your neurons into one big pool and have a master distribute them in chunks of, I don't know, maybe ten thousand at a time, each to its own thread. Scala will work fine for this sort of thing, but expect to do a lot of low-level work yourself, or wait for a really long time for the simulation to finish.

Why do game developers put many images into one big image?

Over the years I've often asked myself why game developers place many small images into a big one. But not only game developers do that. I also remember the good old Winamp MP3 player had a user interface design file which was just one huge image containing lots of small ones.
I have also seen some big javascript GUI libraries like ext.js using this technique. In ext.js there is a big image containing many small ones.
One thing I noticed is this: No matter how small my PNG image is, the Finder on the Mac always tells me it consumes at least 4kb. Which is heck of a lot if you have just 10 pixels.
So is this done because storing 20 or more small images into a big one is much more memory efficient versus having 20 separate files, each of them probably with it's own header and metadata?
Is it because locating files on the file system is expensive and slow, and therefore much faster to simply locate only one big image and then split it up into smaller ones, once it is loaded into memory?
Or is it lazyness, because it is tedious to think of so many file names?
Is ther a name for this technique? And how are those small images separated from the big one at runtime?
This is called spriting - and there are various reasons to do it in different situations.
For web development, it means that only one web request is required to fetch the image, which can be a lot more efficient than several separate requests. That's more efficient in terms of having less overhead due to the individual requests, and the final image file may well be smaller in total than it would have been otherwise.
The same sort of effect may be visible in other scenarios - for example, it may be more efficient to store and load a single large image file than multiple small ones, depending on the file system. That's entirely aside from any efficiencies gained in terms of the raw "total file size", and is due to the per file overhead (a directory entry, block size etc). It's a bit like the "per request" overhead in the web scenario, but due to slightly different factors.
None of these answers are right. The reason we pack multiple images into one big "sprite sheet" or "texture atlas" is to avoid swapping textures during rendering.
OpenGL and Direct-X take a performance hit when you draw from one image (texture) and the switch to another, so we pack multiple images into one big image and then we can draw several (or hundreds) of images and never switch textures. It has nothing to do with the 4K file size (or hasn't in 15 years).
Also, up until very recently, textures had to by powers of 2 (64, 128, 256) and if your game had lots of odd sized images, that's a lot of wasted memory. Packing them in a single texture could save a lot of space.
The 4kb usage is a side effect of how files are stored on disk. The smallest possible addressable bit of storage in a filesystem is a block, which is usually a fixed size of 512, 1024, 2048, etc... bytes. In your Mac's case, it's using 4k blocks. That means that even a 1-byte file will require at least 4kbytes worth of physical space to store, as it's not possible for the file system to address any storage unit SMALLER than 4k.
The reasons for these "large" blocks vary, but the big one is that the more "granular" your addressing gets (the small the blocks), the more space you waste on indexes to list which blocks are assigned to which files. If you had 1-byte sized blocks, then for every byte of data you store in a file, you'd also need to store 1+ bytes worth of usage information in the file system's metadata, and you'd end up wasting at least HALF of your storage on nothing but indexes.
The converse is true - the bigger the blocks, the more space is wasted for every smaller-than-one-block sized file you store, so in the end it comes down to what tradeoff you're willing to live with.
The reasons are a bit different in different environments.
On the web the main reason is to reduce the number of requests to the web server. Each requests creates overhead, most notably a separate round trip over the network.
When fetching from good ol' mechanical hard drives good read performance requires contiguous data. If you save data in lots of files you get extra seek-time for each file. There is also the block size to consider. Files are made out of blocks, in your case 4kB. When reading a file of one byte you need to read a whole block anyway. If you have many small images you can stuff a whole bunch of them in a single disk block and get them all in the same time as if you had only one small image in the block.
Another reason from days of yore was palletes.
If you did one image you could theme it with one pallete Colour = 14 = light grey with a hint of green.
If you did lots of little images you had to make sure you used the same pallet for every one while designing them, or you got all sort of artifacts.
Given you had one pallete then you could manipulate that, so everything currently green could be made red, by flipping one value in the palletes instead of trawling through every image.
Lots of simple animations like fire, smoke, running water are still done with this method.