Spatial Point Pattern Analysis Python - pyspark

I am observing grid patterns in my raw location data and am trying to clean such locations without getting rid of a lot of valid locations or any other pattern that is common which I can filter out.
As I am new to location data, I am not sure what are the methods I could explore to get rid of these data points. I will first try to look into DBSCAN, but am hoping to seek more advice on what other methods I can explore. The dataset is quite large (petabytes) so I have to keep in mind the scalability issue. I tried searching up on spatial point analysis patterns but I am not able to find anything that might help.

Related

Can I plot the same subset of many days in grafana

I have an interesting event that happens once a day at (say) 10:00. I would like to plot some data for a small time around then for multiple days to explore the data. Is there an easy way to do this with Grafana?
Rough sketch of what I’d like (but I’m not very picky):
A few things I tried:
Searching on the internet. I think this failed because I don’t know the right words to describe what I want
Assuming this is impossible. The reason I would expect this to be possible is that it seems somewhat common to e.g. want to plot the value of some metric during working hours only. But it also seems like something that mightn’t be needed for a reasonably steady 24/7 operation.
Setting up a dashboard with many panels with appropriate timeshifts. This is a bit fiddly to set up and I think it is hard to change the metric being looked at. It also leaves a lot of empty space and I’m not sure if it is possible to lock the y axis scale to be the same for all the panels
I assume that this isn’t currently supported and therefore I suppose the question is what the best workarounds would be. In particular I’m hoping to optimize for something that is convenient to explore the data rather than somethings that is powerful.
Some alternatives that would likely also be good for me:
Some scheme for weighting the x axis (so I could give time in the gap a weight of 0)
Some way to automatically add discontinuities to the x axis when there is no data
A few more specifics:
Grafana v8.1.8 (52edcff798)
The data is stored in VictoriaMetrics, and queried via a PromQL interface (PromQL is a subset of VictoriaMetrics’ MetricsQL. Currently I can’t make MetricsQL queries but can maybe change to do that if it would help here).

How to plot all the stream lines in paraview?

I am simulating the case "Cavity driven lid" and I try to get all the stream lines with the stream tracer of paraview, but I only get the ones that intersect the reference line, and because of that there are vortices that are not visible. How can I see all the stream-lines in the domain?
Thanks a lot in adavance.
To add a little bit to Mathieu's answer, if you really want streamlines everywhere, then you can create a Stream Tracer With Custom Source (as Mathieu suggested) and set your data to both the Input and the Seed Source. That will create a streamline originating from every point in your dataset, which is pretty much what you asked for.
However, while you can do this, you will probably not be happy with the results. First of all, unless your data is trivially small, this will take a long time to compute and create a large amount of data. Even worse, the result will be so dense that you won't be able to see anything. You will get all those interesting streamlines through vortices, but they will be completely hidden by all the boring streamlines around them.
Thus, you are better off with trying to derive a data set that contains seed points that are likely to trace a stream through the vortices that you are interested in. One thing you might want to try is to compute the vorticity of your vector field (Gradient Of Unstructured Data Set when turning on advanced option Compute Vorticity), find the magnitude of that (Calculator), and then use the Threshold filter to pull out the cells with large vorticity. Then use that as your Seed Source.
Another (probably better) option if your data is 2D or you can extract an interesting surface along the flow of your data is to use the Surface LIC plugin. Details can be found at https://www.paraview.org/Wiki/ParaView/Line_Integral_Convolution.
You have to choose a representative source for your streamline.
You could use a "Sphere Source", so in the StreamTracer properties.
If that fails, you can use a StreamTracerWithCustomSource and use your own source that you will have to create yourself first.

How to remove nodes from TensorFlow graph?

I need to write a program where part of the TensorFlow nodes need to keep being there storing some global information(mainly variables and summaries) while the other part need to be changed/reorganized as program runs.
The way I do now is to reconstruct the whole graph in every iteration. But then, I have to store and load those information manually from/to checkpoint files or numpy arrays in every iteration, which makes my code really messy and error prone.
I wonder if there is a way to remove/modify part of my computation graph instead of reset the whole graph?
Changing the structure of TensorFlow graphs isn't really possible. Specifically, there isn't a clean way to remove nodes from a graph, so removing a subgraph and adding another isn't practical. (I've tried this, and it involves surgery on the internals. Ultimately, it's way more effort than it's worth, and you're asking for maintenance headaches.)
There are some workarounds.
Your reconstruction is one of them. You seem to have a pretty good handle on this method, so I won't harp on it, but for the benefit of anyone else who stumbles upon this, a very similar method is a filtered deep copy of the graph. That is, you iterate over the elements and add them in, predicated on some condition. This is most viable if the graph was given to you (i.e., you don't have the functions that built it in the first place) or if the changes are fairly minor. You still pay the price of rebuilding the graph, but sometimes loading and storing can be transparent. Given your scenario, though, this probably isn't a good match.
Another option is to recast the problem as a superset of all possible graphs you're trying to evaluate and rely on dataflow behavior. In other words, build a graph which includes every type of input you're feeding it and only ask for the outputs you need. Good signs this might work are: your network is parametric (perhaps you're just increasing/decreasing widths or layers), the changes are minor (maybe including/excluding inputs), and your operations can handle variable inputs (reductions across a dimension, for instance). In your case, if you have only a small, finite number of tree structures, this could work well. You'll probably just need to add some aggregation or renormalization for your global information.
A third option is to treat the networks as physically split. So instead of thinking of one network with mutable components, treat the boundaries between fixed and changing pieces are inputs and outputs of two separate networks. This does make some things harder: for instance, backprop across both is now ugly (which it sounds like might be a problem for you). But if you can avoid that, then two networks can work pretty well. It ends up feeling a lot like dealing with a separate pretraining phase, which you many already be comfortable with.
Most of these workarounds have a fairly narrow range of problems that they work for, so they might not help in your case. That said, you don't have to go all-or-nothing. If partially splitting the network or creating a supergraph for just some changes works, then it might be that you only have to worry about save/restore for a few cases, which may ease your troubles.
Hope this helps!

Scala streaming peak detection with reactive events

I am trying to work out the best way to structure an application that in essence is a peak detection program. In my line of work I have been given charge of developing a system that essentially is looking at pulses in a stream of data and doing calculations on the peak data.
At the moment the software is implemented in LabVIEW. I'm sure many of you on here would understand why I'd love to see the end of that environment. I would like to redesign this in Scala (and possibly use Play if I was to make it use a web frontend) but I am not sure how best to approach the initial peak-detection component.
I've seen many tutorials for peak detection in various languages and I understand from a theoretical perspective many of the algorithms. What I am not sure is how would I approach this from the most Scala/Play idiomatic way?
Obviously I don't expect someone to write the code for me but I would really appreciate any pointers as to the direction I should take that makes the most sense. Since I cannot be too specific on the use case I'll try to give an overview of what I'm trying to do below:
Interfacing with data acquisition hardware to send out control voltages and read back "streams" of data.
I should be able to work the hardware side out, but is there a specific structure that would be best for the returned stream? I don't necessarily know ahead of time how much data I'll be reading so a stream that can be buffered and chunked would probably be appropriate.
Scan through the stream to find peaks and measure their height and trigger an event.
Peaks are usually about 20 samples wide or so but that depends on sample rate so I don't want to hard-code anything like that. I assume a sliding window would be necessary so peaks don't get "cut off" on the edge of a buffer. As a peak arrives I need to record and act on it. I think reactive streams and so on may be appropriate but I'm not sure. I will be making live graphs etc with the data so however it is done I need a way to send an event immediately on a successful detection.
The streams can be quite long and are at high sample-rates (minimum of 250ksamples per second) so I'd prefer not to have to buffer the entire stream to memory. The only information that needs to be permanent is the peak voltage data. I will need a way to visualise the raw stream for calibration purposes but I imagine that should be pretty simple.
The full application is much more complex and I'll need to do some initial filtering of noise and drift but I believe I should be able to work that out once I know what kind of implementation I should build on.
I've tried to look into Play's Iteratees and such but they are a little hard to follow. If they are an appropriate fit then I'm happy to work on learning them but since I'm not sure if that is the best way to approach the problem I'd love to know where I should look.
Reactive frameworks and the like certainly look interesting and I can see how I could really easily build the rest of the application around them but I'm just not sure how best to implement a streaming peak detection function on top of them beyond something simple like triggering when a value is over a threshold (as mentioned previously a "peak" can be quite wide and the signal is noisy).
Any advice would be greatly appreciated!
This is not a solution to this question but I'm writing this as an answer because of space/formatting limitations in the comments section.
Since you are exploring options I would suggest the following:
Assuming you have a large enough buffer to keep a window of data in memory (W=tXw) you can calculate the peak for the buffer using your existing algorithm. Next you can collect the next few samples data in a delta buffer (d) (a much smaller window). The delta buffer is the size of your increment. Assuming this is time series data you can easily create the new sliding window by removing the first delta (dXt) values from the buffer W and adding d values to the buffer. This is how Spark-streaming implements reduceByWindow function on a DStream. Iteratee can also help here.
If your system is distributed then you can use stream processing systems (Storm, Spark-streaming) to get better latency and throughput at the cost of distributing the system.
If you are really resource constrained and can live approximate results that bounded I would suggest you look at implementing a combination of probabilistic data structures such as count-min-sketch, hyperloglog and bloom filter.

Calculation route length

I have a map with about 80 annotations. I would like to do 3 things.
1) From my current location, I would like to know the actual route distance to that position. Not the linear distance.
2) I want to be able to show a list of all the annotations, but for every annotation (having lon/lat) I would like to know the actual route distance from my position to that position.
3) I would like to know the closest annotation to my possition using route distance. Not linear distance.
I think the answer to all these three points will be the same. But please keep in mind that I don't want to create a route, I just want to know the distance to the annotation.
I hope someone can help me.
Best regards,
Paul Peelen
From what I understand of your post, I believe you seek the Haversine formula. Luckily for you, there are a number of Objective-C implementations, though writing your own is trivial once the formula's in front of you.
I originally deleted this because I didn't notice that you didn't want linear distance at first, but I'm bringing it back in case you decide that an approximation is good enough at that particular point of the user interaction.
I think as pointed out before, your query would be extremely heavy for google maps API if you perform exactly what you are saying. Do you need all that information at once ? Maybe first it would be good enough to query just some of the distances based on some heuristic or in the user needs.
To obtain the distances, you could use a Google Maps GDirections object... as pointed out here ( at the bottom of the page there's "Routes and Steps" section, with an advanced example.
"The GDirections object also supports multi-point directions, which can be constructed using the GDirections.loadFromWaypoints() method. This method takes an array of textual input addresses or textual lat/lon points. Each separate waypoint is computed as a separate route and returned in a separate GRoute object, each of which contains a series of GStep objects."
Using the Google Maps API in the iPhone shouldn't be too difficult, and I think your question doesn't cover that, but if you need some basic example, you could look at this question, and scroll to the answer.
Good Luck!
Calculating route distance to about 80 locations is certain to be computationally intensive on Google's part and I can't imagine that you would be able to make those requests to the Google Maps API, were it possible to do so on a mobile device, without being severely limited by either the phone connection or rate limits on the server.
Unfortunately, calculating route distance rather than geometric distance is a very expensive computation involving a lot of data about the area - data you almost certainly don't have. This means, unfortunately, that this isn't something that Core Location or MapKit can help you with.
What problem are you trying to solve, exactly? There may be other heuristics other than route distance you can use to approximate some sort of distance ranking.