What is the difference between a view and a stream? - scala

In the Scala 2.8 collections framework, what is the difference between view and toStream?

In a view elements are recomputed each time they are accessed. In a stream elements are retained as they are evaluated.
For example:
val doubled = List(1,2,3,4,5,6,7,8,9,10).view.map(_*2)
println(doubled.mkString(" "))
println(doubled.mkString(" "))
will re-evaluate the map for each element twice. Once for the first println, and again for the second. In contrast
val doubled = List(1,2,3,4,5,6,7,8,9,10).toStream.map(_*2)
println(doubled.mkString(" "))
println(doubled.mkString(" "))
will only double the elements once.
A view is like a recipe to create a collection. When you ask for elements of a view it carries out the recipe each time.
A stream is like a guy with a bunch of dry-erase cards. The guy knows how to compute subsequent elements of the collection. You can ask him for the next element of the collection and gives you a card with the element written on it and a string tied from the card to his finger (to help him remember). Also, before he gives you a card he unties the first string from his finger and ties it to the new card.
If you hold onto the first card (i.e. keep a reference to the head of the stream) you might eventually run out of cards (i.e. memory) when you ask for the next element, but if you don't need to go back to the first elements you can cut the string and hand the unneeded cards back to the guy and he can re-use them (they're dry-erase afterall). This is how a stream can represent an infinite sequence without running out of memory.

Geoff's answer covers almost everything, but I want to add that a Stream is a List-like sequence, while every kind of collections (maps, sets, indexed seqs) have views.

Another way to explain this if you know apache spark would be that using stream is like caching the spark dataset, whereas using view is like using an uncached dataset, meaning that every time you call some action on it, it will reevaluate everything in the DAG.

Related

swift array 'contains' function in place optimization

I am wondering if the built-in array structure for the contains function has any optimizations. If it does a linear search of contains every time I run it, it's at best O(n), which turns into O(n^2) because I'll be looping through another set of points to check against, however if it somehow behind the scenes sorts the array the first time 'contains' is run, then every subsequent 'contains' would be O(log(n)).
I have an array that gradually gets larger the more in depth the user gets into the application, and I am using the 'contains' a lot, so I am expecting it to slow down the application the longer the user is using the application.
If the array doesn't have any behind the scenes optimizations, then ought I build my own? (e.g. quicksort, and do an insert(newElement:, at:) every time I add to the array?)
Specifically, I'm using
[CGPoint],
CGPointArrayVariable.contains(newCGPoint) // run 100s - 10000s of times ideally every frame, (but realistically probably every second)
and when I add new CGPoints I'm using
CGPointArrayVariable += newCGPointSet.
So, the question: Am I ok continuing to use the built in .contains function (will it be fast enough?) or should I build my own structure optimizing for the contains, and keeping the array sorted? (maybe an insertion sort would be better to use opposed to a quicksort, if that's the direction recommended)
Doing something like that every frame will be VERY inefficient.
I would suggest re-thinking your design to avoid tracking that amount of information for every frame.
If it's absolutely necessary, building your own Type that uses a Dictionary rather than an Array should be more efficient.
Also, if it works with your use case using a Setmight be the best option.

Features of GStreamer

Does GStreamer have the following functionalities/features, or is it possible to implement them on top of GStreamer:
Time windows: set up the graph such that a sink pad of one element does not just receive the current frame, but also n previous frames and m future frames. Including when seeking to a new position.
No data copies when passing data between elements, instead reusing the same buffer.
Having shared data between mutiple elements on different branches, that changes with time, but is buffered in such a way that all elements get the same value for it for the same frame index.
Q1) Time windows
You need to write your plugin using GstAdapter.
Q2) No data copies when passing data between elements
It's done by default. No data is copied from element to element unless required. It just passes a pointer to an instance of GstBuffer. If an element is like encoder or filter, which needs to work on a buffer to produce new data, a new GstBuffer instance is created with newly generated data in GstMemory, obviously.
Q3) Having shared data between mutiple elements
Not sure exactly what you mean. Is is possible to achieve what you want by using GstMemory share? Take a look at gst_memory_share(), gst_buffer_copy_region(), or gst_adapter_get_buffer().

Database and item orders (general)

I'm right now experimenting with a nodejs based experimental app, where I will be putting in a list of books and it will be posted on a forum automatically every x minutes.
Now my question is about order of these things posted.
I use mongodb (not sure if this changes the question or not) and I just add a new entry for every item to be posted. Normally, things are posted in the exact order I add them.
However, for the web interface of this experimental thing, I made a re-ordering interaction where I can simply drag and drop elements to reorder them.
My question is: how can I reflect this change to the database?
Or more in general terms, how can I order stuff in general, in databases?
For instance if I drag the 1000th item to 1st order, everything below needs to be edited (in db) between 1 and 1000 the entries. This does not seem like a valid and proper solution to me.
Any enlightenment is appreciated.
An elegant way might be lexicographic sorting. Introduce a String attribute for each item. Make the initial length of the values large enough to accomodate the estimated number of items. E.g., if you expect 1000 items, let the keys be baa, bab, bac, ... bba, bbb, bbc, ...
Then, when an item is moved from where it is to another place between two items, assign a value to the sorting attribute of the moved item that is somewhere equidistant (lexicographically) to those items. So to move an item between dei and dej, give it the value deim. To move an item between fadd and fado, give it the value fadi.
Keys starting with a were initially not used to leave space for elements that get dragged before the first one. Never use the key a, as it will be impossible to move an element before this one.
Of course, the characters used may vary according to the sort order provided by the database.
This solution should work fine as long as elements are not reordered extremely frequently. In a worst case scenario, this may lead to longer and longer attribute values. But if the movements are somewhat equally distributed, the length of values should stay reasonable.

Scala which data structure is most efficient for my intended operations?

I am running a dynamic programming function where I carry a list of strings throughout the process.
Over time, I append new strings to the end of this list, and occasionally I may remove the last element. Right now I am currently using a mutable ListBuffer, doing += for the appends and .trimEnd(1) for the removals.
Once my dynamic programming procedure is done, I need to efficiently be able to access each element of that list/sequence/etc, and in order (the first item I inserted will be first accessed, whereas last item I inserted will be the last accessed).
I've also tried ArrayBuffers but they both seem too slow. I am trying to speed this process up and I am wondering if I am using a data structure that has O(n) operations when there may be something that has O(1) time operations for what I need.
A simple singly linked list will provide O(1) for the append/discard portions of what you describe. Worst case, if you need to reverse the list at the end before processing, that would be an O(n) operation paid once.
Note that if you go down this road, during the accumulation phase "first" and "last" will be reversed (you will prepend items and drop the first item by getting the tail of the list).

Requesting a clear, picturesque explanation of Reactive Extensions (RX)?

For a long time now I am trying to wrap my head around RX. And, to be true, I am never sure if I got it - or not.
Today, I found an explanation on http://reactive-extensions.github.com/RxJS/ which - in my opinion - is horrible. It says:
RxJS is to events as promises are to async.
Great. This is a sentence so full of complexity that if you do not have the slightest idea of what RX is about, after that sentence you are quite as dumb as before.
And this is basically my problem: All the explanations in the usual places you find about RX make (at least me) feel dumb. They explain RX as a highly sophisticated concept with lots of highly complicated words and terms and whatsoever, and I am never quite sure what it is about.
So my question is: How would you explain RX to someone who is five years old? I'd like a clear, picturesque explanation of what it is, what it is good for, and what its main concepts are?
So, LINQ (in JavaScript, these are high-level array methods like map, filter, reduce, etc - if you're not a C# dev, just replace that whenever I mention 'LINQ') gives you a bunch of tools that you can apply to Sequences ("Lists" in a crude sense), in order to filter and transform an input into an output (aka "A list that's actually interesting to me"). But what is a list?
What is a List?
A List, is some elements, in a particular order. I can take any list and transform it into a better list with LINQ.
(Not necessarily sorted order, but an order).
An Event is a List
But what about an Event? Let's subscribe to an event:
OnKeyUp += (o,e) => Console.WriteLine(e.Key)
>>> 'H'
>>> 'e'
>>> 'l'
>>> 'l'
>>> 'o'
Hm. That looks like some things, in a particular order. It now suddenly dawns upon you, a list and an event are the same thing!
If Lists and Events are the Same....
...then why can't I transform and filter input events into more interesting events. That's what Rx is. It's taking everything you know about dealing with sequences, including all of the LINQ operators like Select and Where and Aggregate, and applies them to events.
Easy peasy.
A Callback is a Sequence Too
Isn't a Callback just basically an Event that only happens once? Isn't it basically just like a List with one item? Turns out it is, and one of the interesting things about Rx is that it lets us treat Events and Callbacks (and things like Geolocation requests) with the same language (i.e. we can combine the two, or wait for ether one or the other, etc etc).
Along with Paul's excellent answer I'd like to add the concept of pulling vs pushing data.
Pipeline
Lets take the example of some code that generates a series of numbers, and outputs the result. If you think of this as a stream on one end you have a producer that is creating new numbers for you, and on the other end you have a consumer that is doing something with those numbers.
Pull - Primes List
Lets say the producer is generating a list of prime numbers. Normally you would have some function that yields a list of numbers, and every time it returned it would push the next value it has calculated through the pipe to the consumer, which would output that number to the screen.
Prime Generator ---> Console.WriteLine
In this scenario it is easy to see that the producer is doing most of the work, and the consumer would be sitting around waiting for the producer to send the next value. The consumer is pulling on the pipeline, waiting for the producer to return the next value.
Push - Progress percent events from a fast process (Reactive)
Ok, let's say you have a function that is processing 1,000,000 items. Each item takes milliseconds to process, and then the function yields out a percentage value of how far it has gotten. So lots of progress values, very fast.
At the other end of the pipeline you have a progress bar. Now if the progress bar was to handle every update the UI would block trying to keep up with the stream of values.
1-Million-Items-Processor ---> Progress Bar
In this scenario the data is being pushed through the pipeline by the producer and then the consumer is blocking because too much data is being pushed for it to handle.
Reactive allows you to put in delays, windows, or to sample the pipeline depending on how you wish to consume the data. In this case I would sample the data every second before updating the progress bar.
Lists vs Events
So lists and events are kinda the same. The difference is whether the data is pulled or pushed through the system. With lists the data is pulled. With events the data is pushed.