Bit fields in Scala - scala

How to simulate bit fields in Scala? The bit fields are used to access some bits of one type (like this in C link). I know it's possible to write with bit operators, but I think there a better way if not consider the performance.
Thanks for every hint that might give.

If you just want single bits, then collection.BitSet will work for you.
If you want a proper bit field class, then you're out of luck for two reasons. First, because Scala doesn't have one. Second, because even if it did, the space savings would probably not be very impressive since the overhead of the enclosing object would probably be large compared to your bits.
There are a couple of ways out of this with some work: a custom-defined class that wraps an integer and lets you operate on parts of it as bit fields; when you go to store the integer, though, you just have it saved as a primtive int. Or you could create an array of bit field structs (of arbitrary length) that are implemented as an array of integers. But there's nothing like that built in; you'll have to roll your own.

Sadly not... The shift operators and bitwise boolean operators are pretty much all you've got.

There's also this repo,
word-aligned compressed variant of the Java bitset class. It uses a 64-bit run-length encoding (RLE) compression scheme.
http://code.google.com/p/javaewah/

Related

What's the point of using "map()" for two elements in perl?

I've seen code where there are just two rather static elements to be mapped such as time intervals with start and end dates, yet map() is being used rather than explicit code for mapping, e.g.
{ map { ... } qw(start end) } # vs.
{ start => ..., end => ... }
Which way is preferrable, and why?
The map form may be less concise but looks more functional (as in functional programming), so I guess that's why it may be preferred over explicit code and is perhaps more DRY.
However, it looks less legible to me because there is more logic going on behind, and mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
EDIT
There is a conflicting goal in programming: KISS (keep it { pick 2 from: small, simple, stupid }). Using map slightly complicates code.
Assuming you're not just setting both items to the same constant or something similarly trivial, I would expect the map version to be more concise.
IMO, the main point in favor of the map version is that you know the same process will be used to produce both values. Not only for the sake of DRY, but also because it eliminates any concern that one might have a subtle change which the other doesn't.
As for the performance concern... If your use case is sufficiently performance-sensitive for any potential difference to matter, then you shouldn't be using Perl in the first place. Switching to well-written C (not C#, not C++, not Objective C - just plain C) will have a far greater performance impact than micro-optimizing whether you assign two values individually vs. using a loop to set them. But the odds of your use case being that sensitive are approximately zero anyhow.
There is a principle of coding known as DRY. Don't Repeat Yourself.
It asserts that:
Every piece of knowledge must have a single, unambiguous, authoritative representation within a system.
And that can be interpreted as condensing duplicate typing with (things like) map/for.
I use idioms like the one you've quoted when I'm trying to expand some text - for example:
my #defs = map { "DEF:$_=$source_file:$_:MAX" } qw ( read write );
This generates me some DEF lines for rrdtool.
I'm doing it this way, because for some cases, I've got considerably longer lists of 'things I want to define' and want to be consistent. (Sometimes I have say, 10 similar lines that differ only by a single word).
But also because:
my #defs = ( "DEF:read=$source_file:read:MAX",
"DEF:write=$source_file:write:MAX" );
There's not much in it for two elements, and I'd suggest it's as much a matter of style as anything. However, if you've got more than that, it quickly becomes very beneficial because you can change the single line - say you've got a different file location? Want to swap MAX for AVERAGE?
It's also quite shockingly easy to go 'punctuation blind' when looking at a long sequence of similar statements, where someone's typo-ed and added a , where it should be . or similar.
And ... you probably don't lose a great deal in terms of readability. But will acknowledge that's something of a style point, because whilst map is pretty amazing, it can make for some rather hard to read code if you're not careful.
Also to specifically address:
mapping should also be less efficient because it invokes calls a and consists of more atomic operations.
A wise man once said:
premature optimization is the root of all evil
Don't think about the efficiency of a statement - look at the legibility/readability. Compilers are pretty clever. Most "obvious" optimisations, they already make for you. Processors are also pretty fast. Your limiting factor in most code isn't the amount of CPU cycles you need, it's IO throughput and memory footprint. So don't worry about it - write clear code.
And if there's a performance critical demand on your code, you should be using a code profiler to look at where you gain the most efficiency for your effort at refactoring. You may end up with less clear code in doing so (sometimes) but that's a more clear tradeoff.

What is a multi-pass Sequence in Swift?

I am looking at this language from the Swift GeneratorType documentation and I'm having a hard time understanding it:
Any code that uses multiple generators (or for...in loops) over a single sequence should have static knowledge that the specific sequence is multi-pass, either because its concrete type is known or because it is constrained to CollectionType. Also, the generators must be obtained by distinct calls to the sequence's generate() method, rather than by copying.
What does it mean for a sequence to be "multi-pass"? This language seems quite important, but I can't find a good explanation for it. I understand, for example, the concept of a "multi-pass compiler", but I'm unsure if the concepts are similar or related...
Also, I have searched SO for other posts that answer this question. I have found this one, which makes the following statement in the C++ context:
The difference between algorithms that copy their iterators and those that do not is that the former are termed "multipass" algorithms, and require their iterator type to satisfy ForwardIterator, while the latter are single-pass and only require InputIterator.
But the meaning of that isn't entirely clear to me either, and I'm not sure if the concept is the same in Swift.
Any insight from those wiser than me would be much appreciated.
A "multi-pass" sequence is one that can be iterated over multiple times via a for...in loop or by using any number of generators (constructed via generate())
The text explains you would know a sequence is multi-pass because you
know its type (perhaps a class you designed) or
know it conforms to CollectionType. (for example, sets and arrays)

Should I prefer hashes or hashrefs in Perl?

I'm still learning perl.
For me it feels more "natural" to references to hashes rather than directly access them, because it is easier to pass references to a sub, (one variable can be passed instead of a list). Generally I prefer this approach to that where one directly accesses %hashes".
The question is, where (in what situations) is better to use plain %hashes, so
$hash{key} = $value;
instead of
$href->{key} = $value
Is here any speed, or any other "thing" what prefers to use %hashes and not $hashrefs? Or it is matter only of pure personal taste and TIMTOWTDI? Some examples, when is better to use %hash?
I think this kind of question is very legitimate: Programming languages such as Perl or C++ have come a long way and accumulated a lot of historical baggage, but people typically learn them from ahistorical, synchronous exposés. Hence they keep wondering why TIMTOWDI and WTF all these choices and what is better and what should be preferred?
So, before version 5, Perl didn't have references. It only had value types. References are an add-on to Perl 4, enabling lots more stuff to be written. Value types had to be retained, of course, to keep backward compatibility; and also, for simplicity's sake, because frequently you don't need the indirection that references are.
To answer your question:
Don't waste time thinking about the speed of Perl hash lists. They're fast. They're memory access. Accessing a database or the filesystem or the net, that is where your program will typically spend time.
In theory, a dereference operation should take a tiny bit of time, so tiny it shouldn't matter.
If you're curious, then benchmark. Don't draw too many conclusions from differences you might see. Things could look different on another release.
So there is no speed reason to favour references over value types or vice versa.
Is there any other reason? I'd say it's a question of style and taste. Personally, I prefer the syntax without -> accessors.
If you can use a plain hashes, to describe your data, you use a plain hash. However, when your data structure gets a bit more complex, you will need to use references.
Imagine a program where I'm storing information about inventory items, and how many I have in stock. A simple hash works quite well:
$item{XP232} = 324;
$item{BV348} = 145;
$item{ZZ310} = 485;
If all you're doing is creating quick programs that can read a file and store simple information for a report, there's no need to use references at all.
However, when things get more complex, you need references. For example, my program isn't just tracking my stock, I'm tracking all aspects of my inventory. Inventory items also have names, the company that creates them, etc. In this case, I'll want to have my hashes not pointing to a single data point (the number of items I have in stock), but a reference to a hash:
$item{XP232}->{DESCRIPTION} = "Blue Widget";
$item{XP232}->{IN_STOCK} = 324;
$item{XP232}->{MANUFACTURER} = "The Great American Widget Company";
$item{BV348}->{DESCRIPTION} = "A Small Purple Whatzit";
$item{BV348}->{IN_STOCK} = 145;
$item{BV348}->{MANUFACTURER} = "Acme Whatzit Company";
You can do all sorts of wacky things to do something like this (like have separate hashes for each field or put all fields in a single value separated by colons), but it's simply easier to use references to store these more complex structures.
For me the main reason to use $hashrefs to %hashes is the ability to give them meaningful names (a related idea would be name the references to an anonymous hash) which can help you separate data structures from program logic and make things easier to read and maintain.
If you end up with multiple levels of references (refs to refs?!) you start to loose this clean and readable advantage, though. As well, for short programs or modules, or at earlier stages of development where you are retesting things as you go, directly accessing the %hash can make things easier for simple debugging (print statements and the like) and avoiding accidental "action at a distance" issues so you can focus on "iterating" through your design, using references where appropriate.
In general though I think this is a great question because TIMTOWDI and TIMTOCWDI where C = "correct". Thanks for asking it and thanks for the answers.

Does Scala offer functionality similar to Pretty Print in Python?

Does Scala offer functionality similar to Pretty Print pprint in Python?
No, it doesn't. Except for XML, that is -- there's a pretty printer for that, which generates interpreter-readable data.
In fact, it doesn't even have a way to print interpreter-readable data, mainly because of how strings are represented when converted to string. For instance, List("abc").toString is List(abc).
Add to that, there's no facility at all that will break them based on width, or ident nested collections.
That said, it is doable, within the same limits as pprint.

Scala: What is the right way to build HashMap variant without linked lists?

How mouch Scala standard library can be reused to create variant of HashMap that does not handle collisions at all?
In HashMap implementation in Scala I can see that traits HashEntry, DefaultEntry and LinkedEntry are related, but I'm not sure whether I have any control over them.
You could do this by extending HashMap (read the source code of HashMap to see what needs to be modified); basically you'd override put and += to not call findEntry, and you'd override addEntry (from HashTable) to simply compute the hash code and drop the entry in place. Then it wouldn't handle collsions at all.
But this isn't a wise thing to do since the HashEntry structure is specifically designed to handle collisions--the next pointer becomes entirely superfluous at that point. So if you are doing this for performance reasons, it's a bad choice; you have overhead because you wrap everything in an Entry. If you want no collision checking, you're better off just storing the (key,value) tuples in a flat array, or using separate key and value arrays.
Keep in mind that you will now suffer from collisions in hash value, not just in key. And, normally, HashMap starts small and then expands, so you will initially destructively collide things which would have survived had it not started small. You could override initialSize also if you knew how much you would add so that you'd never need to resize.
But, basically, if you want to write a special-purpose high-speed unsafe hash map, you're better off writing it from scratch or using some other library. If you modify the generic library version, you'll get all the unsafety without all of the speed. If it's worth fiddling with, it's worth redoing entirely. (For example, you should implement filters and such that map f: (Key,Value) => Boolean instead of mapping the (K,V) tuple--that way you don't have to wrap and unwrap tuples.)
I guess it depends what you mean by "does not handle collisions at all". Would a thin layer over a MultiMap be sufficient for your needs?