reduce array efficiently in coffeescript - coffeescript

If I have an array of objects in a var. I want to reduce this so that they are grouped by particular property. This is my code
array = tracks.reduce (x,y,i) ->
x[y.album] = []
x
, {}
albums = tracks.reduce (x,y,i) ->
array[y.album].push {'name':y.name, 'mp3':y.mp3}
array
, {}
console.log(albums)
It outputs what I want, however I want to know if there a better way to write this, without having to do the first loop, to create the empty arrays for the groups.
Thanks.

Yes, you can use the or= or ?= operator to assign array[y.ambum] only if it hasn't been initialized; therefore using only one loop. BTW, i think it's a bit confusing that the array variable is an object. Another way of coding this using a CoffeeScript loop instead of reduce is:
albums = {}
for {album, name, mp3} in tracks
(albums[album] or= []).push {name, mp3}
Notice that i'm using destructuring to get the track properties all at once.
Or, if you want to use a reduce:
albums = tracks.reduce (albums, {album, name, mp3}) ->
(albums[album] or= []).push {name, mp3}
albums
, {}
But i think the CS-loop version reads a bit better :)
Bonus track (pun intended): if you happen to have Underscore.js, i'd strongly recommend to use groupBy, which does exactly this kind of grouping job:
albums = _.groupBy tracks, (track) -> track.album
Notice that the tracks for each album name in albums will be "complete" tracks (not just the name and mp3 properties).
Update: A comment about performance: when asked to do something "efficiently" i interpret it as doing in the most direct and clean way possible (i'm thinking about the programmers' efficiency when reading the code); but many will unequivocally relate efficiency to performance.
About performance, all these three solutions are O(n), n being the number of tracks, in complexity; so neither is horribly worse than others.
It seems raw for loops run faster on modern JS engines than their equivalent higher order brothers: forEach, reduce, etc (which is quite saddening IMO :(...). So the first version should run faster than the second one.
In the case of the Underscore version, i won't do any prediction, as Underscore is known for using the higher order functions a lot instead of raw for loops, but at the same time, that version does not create a new object for each track.
In any case, you should always profile your code before changing your solution to one that might be more performant but is less readable. If you notice that that particular loop is a bottleneck, and you have a good set of data to benchmark it, jsPerf can be really useful :)

Related

Get a value from RichPipe

I have a RichPipe with 3 fields: name: String, time: Long and value: Int. I need to get the value for a specific name, time pair. How can I do it? I can't figure it out from scalding documentation, as it is very cryptic and can't find any examples that do this.
Well a RichPipe is not a Key-Value store, that's why there is no documentation on using as a key-value store :) A RichPipe should be thought of as a pipe - so you can't get at data in the middle without first going in at one end and traversing the pipe till you find the element your looking for. Furthermore this is a little painful in Scalding because you have to write your results to disk (because it's built on top of Hadoop) and then read the result from disk in order to use it in your application. So the code will be something like:
myPipe.filter[String, Long](('name, 'time))(_ == (specificName, specificTime))
.write(Tsv("tmp/location"))
Then you'll need some higher level code to run the job and read the data back into memory to get at the result. Rather than write out all the code to do this (it's pretty straightforward), why don't you give some more context about what your use case is and what you are trying to do - maybe you can solve your problem under the Map-Reduce programming model.
Alternatively, use Spark, you'll have the same problem of having to traverse a distributed dataset, but you don't have the faff of writting to disk and reading back again. Furthermore you can use custom partitioner is Spark that could result in near key-value store like behaviour. But anyway naively, the code would be:
val theValueYouWant =
myRDD.filter {
case (`specificName`, `specificTime`, _) => true
case _ => false
}
.toArray.head._3

List of Tuple<int,string> in PowerShell

I want to create a List<Tuple<int,string>> in PowerShell, but
New-Object System.Collections.Generic.List[System.Collections.Generic.Tuple[int,string]]
does not work. What am I missing?
Lee's answer is the correct way to create a List of Tuples (although you can make the statement much shorter by omitting the System namespace). However, the better questions to ask in while programming in PowerShell are:
Why should I return a strongly typed object?
Do I really want to output a list?
The first one has its pros and cons. Strongly typed objects are useful to return if they have methods or events that will be useful for the next step in the pipeline. If, on the other hand, you just want to return a bunch of items with a name an int, I'd use something like:
[PSCustomObject]#{string="string";int=1}
This will create a property bag (what most devs know as a Tuple) containing the data you need, with more descriptive names than a .NET tuple will give you on the object. It's also pretty fast. If, on the other hand, the data is meant for an API, then by all means created the strongly typed object it expects.
The second question is a little bit harder to understand but has a much clearer answer. In many cases, you'll want to accept input for another function from the output of one function. In this, for many reasons, a strongly typed list is not your best friend. Strongly typed lists do not always clearly convert into arrays (this is especially true for generics), and, as arguments to a function, severely limit the different types of data you can put into the function. They also end up providing a little bit of a misleading and harder to use output (especially when piping in objects and producing multiple results), since the whole list will be displayed as one outputted item, instead of displaying each item on its own. Most annoyingly, strongly typed lists behave differently than arrays in PowerShell when you "over-index" (i.e. ask for item 10000 in a list of 5 items) Arrays will quietly return null. Lists will barf loudly. More practically, accumulating items into a list and then outputting the list will "hold" the pipeline until all items are in. This may be what you want, but in most cases it's nice to see output coming out of a function as it runs. Finally, lists add to the memory overhead of the function, as you need to accumulate a set of objects in the function's stack.
What I generally do is simply emit multiple objects. That is, I avoid using the return keyword and I take advantage of PowerShell's ability to return objects that are not captured into a variable. If I assign the result into a variable, the items will be accumulated within an arraylist and returned to you as an array. This quick little demonstration function shows you how.
function Get-RandomData {
param($count = 10)
foreach ($n in 1..$count){
[PSCustomObject]#{Name="Number$n";Number=Get-Random}
}
}
It's worth noting that specialized collections are still quite useful. I very often use Queues and Stacks when the need arises. However, I very rarely find myself using generics or lists unless I am working with a part of .NET that specifically requires generics or lists. This is pretty personally ironic, since I was the person who tested support for generics in PowerShell V2. It's absolutely required when you want to work with a piece of .NET that can only take a list of tuples. It's slightly to severely counterproductive in all other cases.
You can create it with:
New-Object 'Collections.Generic.List[Tuple[int,string]]'
You are spelling Generic wrongly, and the Tuple types are in the System namespace, not System.Collections.Generic.
This is what worked for me. I am also providing sample code to insert into the list:
$myList = New-Object System.Collections.ArrayList
#add range
$myList.AddRange((
[Tuple]::Create(1,"string 1"),
[Tuple]::Create(2,"string 2"),
[Tuple]::Create(3,"string 3")
));
#add single item
$myList.Add([Tuple]::Create(4,"string 4"))
#create variable and add to list
$myTuple = [Tuple]::Create(5,"string 5")
$myList.Add( $myTuple)
Write-Host $myList
Reference:
Using and Understanding Tuples in PowerShell
You can create a tuple with up to 7 elements like this:
$tuple = [tuple]::Create(1,2,3,4,5,6,7)
and you can get the value of an element by naming its item (starting with item1):
$tuple.item1
1
If you have 8 or more elements then use another tuple for the 8th element:
$tuple = [tuple]::Create(1,2,3,4,5,6,7,[tuple]::create(8,9))
internally, the 8th element is called "rest". You can get the values like this:
$tuple.rest.item1.item1
8
If you need to specify a type for each element then do it in front of each value:
$tuple = [tuple]::create([string]"a", [int]1, [byte]255)
Finally, adding a tuple to a list works like this:
$list = New-Object 'Collections.ArrayList'
$tuple = [tuple]::create([string]"a", [int]1, [byte]255)
$list.add($tuple)
There is no need to specify the tuple-details for creating the list.
Keep in mind, that you cannot change a value of a tuple later and the default sort-method sorts ascendig in the order of items1, item2 etc. (or you need a custom IComparer), but using them is super fast (way faster than working with large lists of PsObject or PSCustomObject and also faster than an import-Csv)!

Complexity of List.reverse?

In Scala, there is reverse method for lists. What is the complexity of this method? Is it better to simply use the original list and always remember that the list is the reverse of what we expect, or to explicitly use reverse before operating on it.
EDIT: What I am really interested in is to get the last two elements of the original list (or the first two of the reversed list).
So I would do something like:
val myList = origList.reverse
val a = myList(0)
val b = myList(1)
This is not in a loop, just a one-time thing in my library... but if someone else uses the library and puts it in a loop, it is not under my control.
Looking at the source, it's O(n) as you might reasonably expect:
override def reverse: List[A] = {
var result: List[A] = Nil
var these = this
while (!these.isEmpty) {
result = these.head :: result
these = these.tail
}
result
}
If in your code you're able to iterate through the list in reverse order at the same cost of iterating in forward order, then it would be more efficient to do this rather than reversing the List.
In fact, if your alternative operation which involves using the original list works in less than O(n) time, then there's a real argument for going with that. Making an algorithm asymptotically faster will make a huge difference if you ever rely on it more (especially if used inside other loops, as oxbow_lakes points out below).
On the whole though I'd expect that anything where you're reversing a list means that you care about the relative ordering of a non-trivial number of elements, and so whatever you're doing is inherently O(n) anyway. (This might not be true for other data structures such as a binary tree; but lists are linear, and in the extreme case even reverse . head can't be done in O(1) time with a singly-linked list.)
So if you're choosing between two O(n) options - for the vast majority of applications, shaving a few nanoseconds off the iteration time isn't going to really gain you anything. Hence it would be "best" to make your code as readable as possible - which means calling reverse and then iterating, if that's closest to your intention.
(And if your app is too slow, and profiling shows that this list manipulation is a hotspot, then you can think about how to make it more efficient. Which by that point may well involve a different option to both of your current candidates, given the extra context you'll have at that point.)

In javascript, what are the trade-offs for defining a function inline versus passing it as a reference?

So, let's say I have a large set of elements to which I want to attach event listeners. E.g. a table where I want each row to turn red when clicked.
So my question is which of these is the fastest, and which uses the least memory. I understand that it's (usually) a tradeoff, so I would like to know my best options for each.
Using the table example, let's say there's a list of all the row elements, "rowList":
Option 1:
for(var r in rowList){
rowList[r].onclick = function(){ this.style.backgroundColor = "red" };
}
My gut feeling is that this is the fastest, since there is one less pointer call, but the most memory intensive, since each rowlist will have its own copy of the function, which might get serious if the onclick function is large.
Option 2:
function turnRed(){
this.style.backgroundColor = "red";
}
for(var r in rowList){
rowList[r].onclick = turnRed;
}
I'm guessing this is going to be only a teensy bit slower than the one above (oh no, one more pointer dereference!) but a lot less memory intensive, since the browser only needs to keep track of one copy of the function.
Option 3:
var turnRed = function(){
this.style.backgroundColor = "red";
}
for(var r in rowList){
rowList[r].onclick = turnRed;
}
I assume this is the same as option 2, but I just wanted to throw it out there. For those wondering what the difference between this and option 2 is: JavaScript differences defining a function
Bonus Section: Jquery
Same question with:
$('tr').click(function(){this.style.backgroundColor = "red"});
Versus:
function turnRed(){this.style.backgroundColor = "red"};
$('tr').click(turnRed);
And:
var turnRed = function(){this.style.backgroundColor = "red"};
$('tr').click(turnRed);
Here's your answer:
http://jsperf.com/function-assignment
Option 2 is way faster and uses less memory. The reason is that Option 1 creates a new function object for every iteration of the loop.
In terms of memory usage, your Option 1 is creating a distinct function closure for each row in your array. This approach will therefore use more memory than Option 2 and Option 3, which only create a single function and then pass around a reference to it.
For this same reason I would also expect Option 1 to be the slowest of the three. Of course, the difference in terms of real-world performance and memory usage will probably be quite small, but if you want the most efficient one then pick either Option 2 or Option 3 (they are both pretty much the same, the only real difference between the two is the scope at which turnRed is visible).
As for jQuery, all three options will have the same memory usage and performance characteristics. In every case you are creating and passing a single function reference to jQuery, whether you define it inline or not.
And one important note that is not brought up in your question is that using lots of inline functions can quickly turn your code into an unreadable mess and make it more difficult to maintain. It's not a big deal here since you only have a single line of code in your function, but as a general rule if your function contains more than 2-3 lines it is a good idea to avoid defining it inline. Instead define it as in Option 2 or Option 3 and then pass around a reference to it.

Scala: read and save all elements of an Iterable

I have an Iterable[T] that is really a stream of unknown length, and want to read it all and save it into something that is still an instance of Iterable. I really do have to read it and save it; I can't do it in a lazy way. The original Iterable can have a few thousand elements, at least. What's the most efficient/best/canonical way? Should I use an ArrayBuffer, a List, a Vector?
Suppose xs is my Iterable. I can think of doing these possibilities:
xs.toArray.toIterable // Ugh?
xs.toList // Fast?
xs.copyToBuffer(anArrayBuffer)
Vector(xs: _*) // There's no toVector, sadly. Is this construct as efficient?
EDIT: I see by the questions I should be more specific. Here's a strawman example:
def f(xs: Iterable[SomeType]) { // xs might a stream, though I can't be sure
val allOfXS = <xs all read in at once>
g(allOfXS)
h(allOfXS) // Both g() and h() take an Iterable[SomeType]
}
This is easy. A few thousand elements is nothing, so it hardly matters unless it's a really tight loop. So the flippant answer is: use whatever you feel is most elegant.
But, okay, let's suppose that this is actually in some tight loop, and you can predict or have benchmarked your code enough to know that this is performance-limiting.
Your best performance for an immutable solution will likely be a Vector, used like so:
Vector() ++ xs
In my hands, this can copy a 10k iterable about 4k-5k times per second. List is about half the speed.
If you're willing to try a mutable solution under the hood, xs.toArray.toIterable usually takes the cake with about 10k copies per second. ArrayBuffer is about the same speed as List.
If you actually know the size of the target (i.e. size is O(1) or you know it from somewhere else), you can shave off another 20-30% of the execution speed by allocating just the right size and writing a while loop.
If it's actually primitives, you can gain a factor of 10 by writing your own specialized Iterable-like-thing that acts on arrays and converts to regular collections via the underlying array.
Bottom line: for a great blend of power, speed, and flexibility, use Vector() ++ xs in most situations. xs.toIndexedSeq defaults to the same thing, with the benefit that if it's already a Vector that it will take no time at all (and chains nicely without using parens), and the drawback that you are relying upon a convention, not a specification for behavior (and it takes 1-3 more characters to type).
How about Stream.force?
Forces evaluation of the whole stream and returns it.
This is hard. An Iterable's methods are defined in terms of its iterator, but that gets overridden by subtraits. For instance, IndexedSeq methods are usually defined in terms of apply.
There is the question of why do you want to copy the Iterable, but I suppose you might be guarding against the possibility of it being mutable. If you do not want to copy it, then you need to rephrase your question.
If you are going to copy it, and you want to be sure all elements are copied in a strict manner, you could use .toList. That will not copy a List, but a List does not need to be copied. For anything else, it will produce a new copy.