Scala streaming peak detection with reactive events - scala

I am trying to work out the best way to structure an application that in essence is a peak detection program. In my line of work I have been given charge of developing a system that essentially is looking at pulses in a stream of data and doing calculations on the peak data.
At the moment the software is implemented in LabVIEW. I'm sure many of you on here would understand why I'd love to see the end of that environment. I would like to redesign this in Scala (and possibly use Play if I was to make it use a web frontend) but I am not sure how best to approach the initial peak-detection component.
I've seen many tutorials for peak detection in various languages and I understand from a theoretical perspective many of the algorithms. What I am not sure is how would I approach this from the most Scala/Play idiomatic way?
Obviously I don't expect someone to write the code for me but I would really appreciate any pointers as to the direction I should take that makes the most sense. Since I cannot be too specific on the use case I'll try to give an overview of what I'm trying to do below:
Interfacing with data acquisition hardware to send out control voltages and read back "streams" of data.
I should be able to work the hardware side out, but is there a specific structure that would be best for the returned stream? I don't necessarily know ahead of time how much data I'll be reading so a stream that can be buffered and chunked would probably be appropriate.
Scan through the stream to find peaks and measure their height and trigger an event.
Peaks are usually about 20 samples wide or so but that depends on sample rate so I don't want to hard-code anything like that. I assume a sliding window would be necessary so peaks don't get "cut off" on the edge of a buffer. As a peak arrives I need to record and act on it. I think reactive streams and so on may be appropriate but I'm not sure. I will be making live graphs etc with the data so however it is done I need a way to send an event immediately on a successful detection.
The streams can be quite long and are at high sample-rates (minimum of 250ksamples per second) so I'd prefer not to have to buffer the entire stream to memory. The only information that needs to be permanent is the peak voltage data. I will need a way to visualise the raw stream for calibration purposes but I imagine that should be pretty simple.
The full application is much more complex and I'll need to do some initial filtering of noise and drift but I believe I should be able to work that out once I know what kind of implementation I should build on.
I've tried to look into Play's Iteratees and such but they are a little hard to follow. If they are an appropriate fit then I'm happy to work on learning them but since I'm not sure if that is the best way to approach the problem I'd love to know where I should look.
Reactive frameworks and the like certainly look interesting and I can see how I could really easily build the rest of the application around them but I'm just not sure how best to implement a streaming peak detection function on top of them beyond something simple like triggering when a value is over a threshold (as mentioned previously a "peak" can be quite wide and the signal is noisy).
Any advice would be greatly appreciated!

This is not a solution to this question but I'm writing this as an answer because of space/formatting limitations in the comments section.
Since you are exploring options I would suggest the following:
Assuming you have a large enough buffer to keep a window of data in memory (W=tXw) you can calculate the peak for the buffer using your existing algorithm. Next you can collect the next few samples data in a delta buffer (d) (a much smaller window). The delta buffer is the size of your increment. Assuming this is time series data you can easily create the new sliding window by removing the first delta (dXt) values from the buffer W and adding d values to the buffer. This is how Spark-streaming implements reduceByWindow function on a DStream. Iteratee can also help here.
If your system is distributed then you can use stream processing systems (Storm, Spark-streaming) to get better latency and throughput at the cost of distributing the system.
If you are really resource constrained and can live approximate results that bounded I would suggest you look at implementing a combination of probabilistic data structures such as count-min-sketch, hyperloglog and bloom filter.

Related

Topics and LL/token in Mallet change every time

Why do I get different keywords and LL/token every time I run topic models in Mallet? Is it normal?
Please help. Thank you.
Yes, this is normal and expected. Mallet implements a randomized algorithm. Finding the exact optimal best topic model for a collection is computationally intractable, but it's much easier to find one of countless "pretty good" solutions.
As an intuition, imagine shaking a box of sand. The smaller particles will sift towards one side, and the larger particles towards the other. That's way easier than trying to sort them by hand. You won't get the exact order, but each time you'll get one of a large number of equally good approximate sortings.
If you want to have a stronger guarantee of local optimality, add --num-icm-iterations 100 to switch from sampling to choosing the single best allocation for each token, given all the others.

How to plot all the stream lines in paraview?

I am simulating the case "Cavity driven lid" and I try to get all the stream lines with the stream tracer of paraview, but I only get the ones that intersect the reference line, and because of that there are vortices that are not visible. How can I see all the stream-lines in the domain?
Thanks a lot in adavance.
To add a little bit to Mathieu's answer, if you really want streamlines everywhere, then you can create a Stream Tracer With Custom Source (as Mathieu suggested) and set your data to both the Input and the Seed Source. That will create a streamline originating from every point in your dataset, which is pretty much what you asked for.
However, while you can do this, you will probably not be happy with the results. First of all, unless your data is trivially small, this will take a long time to compute and create a large amount of data. Even worse, the result will be so dense that you won't be able to see anything. You will get all those interesting streamlines through vortices, but they will be completely hidden by all the boring streamlines around them.
Thus, you are better off with trying to derive a data set that contains seed points that are likely to trace a stream through the vortices that you are interested in. One thing you might want to try is to compute the vorticity of your vector field (Gradient Of Unstructured Data Set when turning on advanced option Compute Vorticity), find the magnitude of that (Calculator), and then use the Threshold filter to pull out the cells with large vorticity. Then use that as your Seed Source.
Another (probably better) option if your data is 2D or you can extract an interesting surface along the flow of your data is to use the Surface LIC plugin. Details can be found at https://www.paraview.org/Wiki/ParaView/Line_Integral_Convolution.
You have to choose a representative source for your streamline.
You could use a "Sphere Source", so in the StreamTracer properties.
If that fails, you can use a StreamTracerWithCustomSource and use your own source that you will have to create yourself first.

Programming an intelligent game-playing bot

I am taking part in a programming competition where the objective is writing a bot that can play a specific game.
The objective of the game is to earn a certain amount of points. You control multiple airships, that you move around, capture islands and navigate drones that carry treasure. You play against one opponent, turns happen simultaneously, and there is a time limit. You can move multiple ships and drones in one turn. You can program your bot in Python, Java or C#.
The exact details don‘t matter, just that each ship has around 15 options each turn (moving and shooting) and overall you have around 10000 different options for each turn (different configurations of airship movements and shooting)
Up until now I approached this competition naively, and haven‘t done anything exceptionally clever (for example, if near enemy, shoot). I have read about minimax algorithms, and I would really like to apply it here (or something similar), you can assume that I can tell the value of a state. My problem is the mass of options for each turn - which create an enourmous branching factor that doesnt let me get very deep.
Question 1: Is there a better, applicable approach to this problem? Perhaps deep-learning or something similar?
Question 2: Is there a way to minimize the branching factor? I`ve read about alpha-beta and similar algorithms, but nothing seems to do the job.
Any help would be much appreciated
The minimax algorithm seems to be natural for these kinds of problems. At first, the game will be modelled in a abstract way and then a solver is used to find the path from current situation to a gamestate which maximize the amount of points. A similar approach to minimax is GOAP, which was implemented in the 1970'er for Shakey the robot under the name STRIPS. But, GOAP and minimax has two problems: first, a abstract model of the game is needed (perhaps in PDDL or in Game Description Language) and second the state-space is to big.
An better alternative to planning is to use a Behavior Tree. Thats a static program which describes the behavior of an agent. No solver is needed and no complete modelling of the game is needed. Instead, a bottom up approach is used with multiple edit-compile-run iterations for finding the optimal behavior tree (Test-driven-development). To implement such programming approach a so called "reactive planner" has to be implemented first which is another word for a realtime scheduler. Thats a module whichs maps a behavior tree onto a gantt-chart for executing an action at a specific moment in time. As introduction, the unity3d Engine is a good starting point, which has a full behaviortree implementation out-of-the-box.

Streaming huge 3D scenes over internet

I want to stream big scenes made of many objects to clients but need some advice on what approach to take. I know PS4 and Battle.NET stream the games even when 70% of the game is not downloaded yet but they work pretty fast with my 18 Mbps connection.
Anyone can please help me where to start and how to start for streaming big scenes?
A lot of these don't necessarily stream huge scenes per se, if "huge scenes" implies transmitting the lowest-level primitive data (individual points, triangles, unique textures on every single object, etc).
They often stream higher-level data like "maps" with a lot of instanced data. For example, they might not transmit the triangles of a thousand trees in a forest. Instead, they might transmit one unique tree asset which is instanced and just scaled and rotated and positioned differently to form a forest (just a unique transformation matrix per tree instance). The result might be that the entire forest can be transmitted without taking much more memory than a single tree's worth of triangles.
They might have two or more characters meshes which have identical geometry or topology and just unique deformations (point positions) or textures ("skins"), significantly reducing the amount of unique data that has to be sent/stored.
When doing this kind of instancing/tiling stuff, what might otherwise be terabytes worth of unique data may fit into megabytes due to the amount of instanced, non-unique data.
So the first step to doing this typically is to build your own level/map editor, e.g. That level/map editor can often serialize something considerably higher-level and tighter than, say, a Wavefront OBJ file due to the sheer amount of tiled/instanced (shared) data. That high-level data ends up being what you stream.
Second is to build scalable servers, and that's a separate beast. To do that often involves very efficient multithreading at the heart of the OS/kernel to achieve very efficient async I/O. There are some great resources out there on this subject, but it's too broad to cover in one simple answer.
And third might be compression of the data to further reduce the required bandwidth.
A commercial game title might seek all three of these, but probably the first thing to realize is that they're not necessarily streaming unique triangles and texels everywhere -- to stream such low-level data would place tremendous strain on the server, especially given the kind of player load that MMOs are designed to handle. There's a whole lot of instanced data that these games, especially MMOs, often use to significantly cut down on the unique data that actually has to occupy memory and be transmitted separately.
Maps and assets are often designed to carefully reuse existing data as much as possible -- carefully made to have maximum repetition to reduce memory requirements but without looking too blatantly redundant (variation vs. economy). They look "huge" but aren't really from a data standpoint given the sheer amount of repetition of the same data, and considering that they don't redundantly store repetitive data. They're typically very, very economical about it.
As far as streaming goes, a simple way might be to break the world down into 2-dimensional regions (with some overlap to allow a seamless experience so that adjacent regions are being streamed as the player travels around the world) with AABBs around them. Stream the data for the region(s) the player is in and possibly visible within the viewing frustum. It can get a lot more elaborate than this but this might serve as a decent starting point.

Processing accelerometer data

I would like to know if there are some libraries/algorithms/techniques that help to extract the user context (walking/standing) from accelerometer data (extracted from any smartphone)?
For example, I would collect accelerometer data every 5 seconds for a definite period of time and then identify the user context (ex. for the first 5 minutes, the user was walking, then the user was standing for a minute, and then he continued walking for another 3 minutes).
Thank you very much in advance :)
Check new activity recognization apis
http://developer.android.com/google/play-services/location.html
its still a research topic,please look at this paper which discuss the algorithm
http://www.enggjournals.com/ijcse/doc/IJCSE12-04-05-266.pdf
I don't know of any such library.
It is a very time consuming task to write such a library. Basically, you would build a database of "user context" that you wish to recognize.
Then you collect data and compare it to those in the database. As for how to compare, see Store orientation to an array - and compare, the same holds for accelerometer.
Walking/running data is analogous to heart-rate data in a lot of ways. In terms of getting the noise filtered and getting smooth peaks, look into noise filtering and peak detection algorithms. The following is used to obtain heart-rate information for heart patients, it should be a good starting point : http://www.docstoc.com/docs/22491202/Pan-Tompkins-algorithm-algorithm-to-detect-QRS-complex-in-ECG
Think about how you want to filter out the noise and detect peaks; the filters will obviously depend on the raw data you gather, but it's good to have a general idea of what kind of filtering you'd want to do on your data. Think about what needs to be done once you have filtered data. In your case, think about how you would go about designing an algorithm to find out when the data indicates activity (like walking, running,etc.), and when it shows the user being stationary. This is a fairly challenging problem to solve, once you consider the dynamics of the device itself (how it's positioned when the user is walking/running), and the fact that there are very few (if not no) benchmarked algos that do this with raw smartphone data.
Start with determining the appropriate algorithms, and then tackle the complexities (mentioned above) one by one.