Related
I have a large scala code base. (https://opensource.ncsa.illinois.edu/confluence/display/DFDL/Daffodil%3A+Open+Source+DFDL)
It's like 70K lines of scala code. We are on scala 2.11.7
Development is getting difficult because compilation - the edit-compile-test-debug cycle is too long for small changes.
Incremental recompile times can be a minute, and this is without optimization turned on. Sometimes longer. And that's with not having edited very many changes into files. Sometimes a very small change causes a huge recompilation.
So my question: What can I do by way of organizing the code, that will improve compilation time?
E.g., decomposing code into smaller files? Will this help?
E.g., more smaller libraries?
E.g., avoiding use of implicits? (we have very few)
E.g., avoiding use of traits? (we have tons)
E.g., avoiding lots of imports? (we have tons - package boundaries are pretty chaotic at this point)
Or is there really nothing much I can do about this?
I feel like this very long compilation is somehow due to some immense amount of recompiling due to dependencies, and I am thinking of how to reduce false dependencies....but that's just a theory
I'm hoping someone else can shed some light on something we might do which would improve compilation speed for incremental changes.
Here are the phases of the scala compiler, along with slightly edited
versions of their comments from the source code. Note that this
compiler is unusual in being heavily weighted towards type checking
and to transformations that are more like desugarings. Other compilers
include a lot of code for: optimization, register allocation, and
translation to IR.Some top-level points:
There is a lot of tree rewriting. Each phase tends to read in a tree
from the previous phase and transform it to a new tree. Symbols, to
contrast, remain meaningful throughout the life of the compiler. So
trees hold pointers to symbols, and not vice versa. Instead of
rewriting symbols, new information gets attached to them as the phases
progress.
Here is the list of phases from Global:
analyzer.namerFactory: SubComponent,
analyzer.typerFactory: SubComponent,
superAccessors, // add super accessors
pickler, // serializes symbol tables
refchecks, // perform reference and override checking,
translate nested objects
liftcode, // generate reified trees
uncurry, // uncurry, translate function values to anonymous
classes
tailCalls, // replace tail calls by jumps
explicitOuter, // replace C.this by explicit outer pointers,
eliminate pattern matching
erasure, // erase generic types to Java 1.4 types, add
interfaces for traits
lambdaLift, // move nested functions to top level
constructors, // move field definitions into constructors
flatten, // get rid of inner classes
mixer, // do mixin composition
cleanup, // some platform-specific cleanups
genicode, // generate portable intermediate code
inliner, // optimization: do inlining
inlineExceptionHandlers, // optimization: inline exception handlers
closureElimination, // optimization: get rid of uncalled closures
deadCode, // optimization: get rid of dead cpde
if (forMSIL) genMSIL else genJVM, // generate .class files
some work around with scala compiler
Thus scala compiler has to do a lot more work than the Java compiler, however in particular there are some things which makes the Scala compiler drastically slower, which include
Implicit resolution. Implicit resolution (i.e. scalac trying to find an implicit value when you make an implicit declartion) bubbles up over every parent scope in the declaration, this search time can be massive (particularly if you reference the same the same implicit variable many times, and its declared in some library all the way down your dependancy chain). The compile time gets even worse when you take into account implicit trait resolution and type classes, which is used heavily by libraries such as scalaz and shapeless.
Also using a huge number of anonymous classes (i.e. lambdas, blocks, anonymous functions).Macros obviously add to compile time.
A very nice writeup by Martin Odersky
Further the Java and Scala compilers convert source code into JVM bytecode and do very little optimization.On most modern JVMs, once the program bytecode is run, it is converted into machine code for the computer architecture on which it is being run. This is called the just-in-time compilation. The level of code optimization is, however, low with just-in-time compilation, since it has to be fast. To avoid recompiling, the so called HotSpot compiler only optimizes parts of the code which are executed frequently.
A program might have different performance each time it is run. Executing the same piece of code (e.g. a method) multiple times in the same JVM instance might give very different performance results depending on whether the particular code was optimized in between the runs. Additionally, measuring the execution time of some piece of code may include the time during which the JIT compiler itself was performing the optimization, thus giving inconsistent results.
One common cause of a performance deterioration is also boxing and unboxing that happens implicitly when passing a primitive type as an argument to a generic method and also frequent GC.There are several approaches to avoid the above effects during measurement,like It should be run using the server version of the HotSpot JVM, which does more aggressive optimizations.Visualvm is a great choice for profiling a JVM application. It’s a visual tool integrating several command line JDK tools and lightweight profiling capabilities.However scala abstracions are very complex and unfortunately VisualVM does not yet support this.parsing mechanisms which was taking a long time to process like cause using a lot of exists and forall which are methods of Scala collections which take predicates,predicates to FOL and thus may pass entire sequence maximizing performance.
Also making the modules cohisive and less dependent is a viable solution.Mind that intermediate code gen is somtimes machine dependent and various architechures give varied results.
An Alternative:Typesafe has released Zinc which separates the fast incremental compiler from sbt and lets the maven/other build tools use it. Thus using Zinc with the scala maven plugin has made compiling a lot faster.
A simple problem: Given a list of integers, remove the greatest one. Ordering is not necessary.
Below is version of the solution (An average I guess).
def removeMaxCool(xs: List[Int]) = {
val maxIndex = xs.indexOf(xs.max);
xs.take(maxIndex) ::: xs.drop(maxIndex+1)
}
It's Scala idiomatic, concise, and uses a few nice list functions. It's also very inefficient. It traverses the list at least 3 or 4 times.
Now consider this , Java-like solution. It's also what a reasonable Java developer (or Scala novice) would write.
def removeMaxFast(xs: List[Int]) = {
var res = ArrayBuffer[Int]()
var max = xs.head
var first = true;
for (x <- xs) {
if (first) {
first = false;
} else {
if (x > max) {
res.append(max)
max = x
} else {
res.append(x)
}
}
}
res.toList
}
Totally non-Scala idiomatic, non-functional, non-concise, but it's very efficient. It traverses the list only once!
So trade-offs should also be prioritized and sometimes you may have to work things like a java developer if none else.
Some ideas that might help - depends on your case and style of development:
Use incremental compilation ~compile in SBT or provided by your IDE.
Use sbt-revolver and maybe JRebel to reload your app faster. Better suited for web apps.
Use TDD - rather than running and debugging the whole app write tests and only run those.
Break your project down into libraries/JARs. Use them as dependencies via your build tool: SBT/Maven/etc. Or a variation of this next...
Break your project into subprojects (SBT). Compile separately what's needed or root project if you need everything. Incremental compilation is still available.
Break your project down to microservices.
Wait for Dotty to solve your problem to some degree.
If everything fails don't use advanced Scala features that make compilation slower: implicits, metaprogramming, etc.
Don't forget to check that you are allocating enough memory and CPU for your Scala compiler. I haven't tried it, but maybe you can use RAM disk instead of HDD for your sources and compile artifacts (easy on Linux).
You are touching one of the main problems of object oriented design (over engineering), in my opinion you have to flatten your class-object-trait hierachy and reduce the dependecies between classes. Brake packages to different jar files and use them as mini libraries which are "frozen" and concentrate on new code.
Check some videos also from Brian Will, who makes a case against OO over-engineering
i.e https://www.youtube.com/watch?v=IRTfhkiAqPw (you can take the good points)
I don't agree with him 100% but it makes a good case against over-engineering.
Hope that helps.
You can try to use the Fast Scala Compiler.
Asides minor code improvements like (e.g #tailrec annotations), depending on how brave you feel, you could also play around with Dotty which boasts faster compile times among other things.
I need to have a very, very long list of pairs (X, Y) in Scala. So big it will not fit in memory (but fits nicely on a disk).
All update operations are cons (head appends).
All read accesses start in the head, and orderly traverses the list until it finds a pre-determined pair.
A cache would be great, since most read accesses will keep the same data over and over.
So, this is basically a "disk-persisted-lazy-cacheable-List" ™
Any ideas on how to get one before I start to roll out my own?
Addendum: yes.. mongodb, or any other non-embeddable resource, is an overkill. If you are interested in a specific use-case for this, see the class Timeline here. Basically, I which to have a very, very big timeline (millions of pairs throughout months), although my matches only need to touch the last hours.
The easiest way to do something like this is to extend Traversable. You only have to define foreach, and you have full control over the traversal, so you can do things like open and close the file.
You can also extend Iterable, which requires defining iterator and, of course, returning some sort of Iterator. In this case, you'd probably create an Iterator for the disk data, but it's going to be much harder to control things like open files.
Here's one example of a Traversable such as I described, written by Josh Suereth:
class FileLinesTraversable(file: java.io.File) extends Traversable[String] {
override def foreach[U](f: String => U): Unit = {
val in = new java.io.BufferedReader(new java.io.FileReader(file))
try {
def loop(): Unit = in.readLine match {
case null => ()
case line => f(line); loop()
}
loop()
} finally {
in.close()
}
}
}
You write:
mongodb, or any other non-embeddable resource, is an overkill
Do you know that there are embeddable database engines, including some really small ones? If you know, I'm not sure about your exact requirement and why would you not use them.
You sure that Hibernate + an embeddable DB (say SQLite) would not be enough?
Alternatively, BerkeleyDB Java Edition, HSQLDB, or other embedded databases could be an option.
If you do not perform queries on the object themselves (and it really sounds like you do not), maybe serialization would be simpler than object-relational mapping for complex objects, but I've never tried, and I don't know which would be faster. But serialization is probably the only way to be completely generic in the type, assuming that your framework of choice offers a suitable interface to write [T <: Serializable]. If not, you could write [T: MySerializable] after creating your own "type-class" MySerializable[T] (like for instance Ordering[T] in the Scala standard library).
However, you don't want to use standard Java serialization for this task. "Anything serializable" sounds a bad requirement because it suggests the use of serialization for this, but I guess you can relax that to "anything serializable with my framework of choice". Serialization is extremely inefficient in time and space and is not designed to serialize a single object, instead it gives you back a file complete with special headers. I would suggest to use some different serialization framework - have a look here for a comparison.
Additional reasons not to go on the road of a custom implementation
In addition, it sounds like you would be reading the file essentially backward, and that's a quite bad access pattern, performance-wise, on non-SSD disks: after reading a sector, it takes an almost complete disk rotation to access the previous one.
Moreover, as Chris Shain pointed out in the comment above, you'd need to use a page-based solution, and you'd need to cope with variable-sized objects.
If you don't want to step up to one of the embeddable DBs, how about a stack in memory mapped files?
A stack seems to meet your desired access characteristics. (Push a bunch of data, and iterate over the most recently pushed data frequently)
You can use Java's MappedByteBuffer directly from Scala. You get to address the file like its memory, without trying to actually load the file into memory.
You'd get some caching for free from the OS this way, since the mapped file would function like virtual memory. Recently written/accessed pages would stay in the OSs file cache until the OS saw fit to flush them (or you flushed them manually) back to disk
You could build your stack from either end of the file if you're worried about sequential read performance, but if you're usually reading data you just wrote I wouldn't expect that would be a problem since it will still be in memory. (Though if you're reading data that youve written over hours/days across pages then it might be a problem)
A file addressed in this way is limited in size to 2GB even on a 64 bit JVM, but you can use multiple files to overcome this limitation.
These Java libraries may contain what you need. They aim to store entries more efficiently than standard Java collections.
github.com/OpenHFT/Chronicle-Queue
github.com/OpenHFT/Chronicle-Map
As it currently stands, this question is not a good fit for our Q&A format. We expect answers to be supported by facts, references, or expertise, but this question will likely solicit debate, arguments, polling, or extended discussion. If you feel that this question can be improved and possibly reopened, visit the help center for guidance.
Closed 9 years ago.
What are the most commonly held misconceptions about the Scala language, and what counter-examples exist to these?
UPDATE
I was thinking more about various claims I've seen, such as "Scala is dynamically typed" and "Scala is a scripting language".
I accept that "Scala is [Simple/Complex]" might be considered a myth, but it's also a viewpoint that's very dependent on context. My personal belief is that it's the very same features that can make Scala appear either simple or complex depending oh who's using them. Ultimately, the language just offers abstractions, and it's the way that these are used that shapes perceptions.
Not only that, but it has a certain tendency to inflame arguments, and I've not yet seen anyone change a strongly-held viewpoint on the topic...
Myth: That Scala’s “Option” and Haskell’s “Maybe” types won’t save you from null. :-)
Debunked: Why Scala's "Option" and Haskell's "Maybe" types will save you from null by James Iry.
Myth: Scala supports operator overloading.
Actually, Scala just has very flexible method naming rules and infix syntax for method invocation, with special rules for determining method precedence when the infix syntax is used with 'operators'. This subtle distinction has critical implications for the utility and potential for abuse of this language feature compared to true operator overloading (a la C++), as explained more thoroughly in James Iry's answer to this question.
Myth: methods and functions are the same thing.
In fact, a function is a value (an instance of one of the FunctionN classes), while a method is not. Jim McBeath explains the differences in greater detail. The most important practical distinctions are:
Only methods can have type parameters
Only methods can take implicit arguments
Only methods can have named and default parameters
When referring to a method, an underscore is often necessary to distinguish method invocation from partial function application (e.g. str.length evaluates to a number, while str.length _ evaluates to a zero-argument function).
I disagree with the argument that Scala is hard because you can use very advanced features to do hard stuff with it. The scalability of Scala means that you can write DSL abstractions and high-level APIs in Scala itself that otherwise would need a language extension. So to be fair you need to compare Scala libraries to other languages compilers. People don't say that C# is hard because (I assume, don't have first hand knowledge on this) the C# compiler is pretty impenetrable. For Scala it's all out in the open. But we need to get to a point where we make clear that most people don't need to write code on this level, nor should they do it.
I think a common misconception amongst many scala developers, those at EPFL (and yourself, Kevin) is that "scala is a simple language". The argument usually goes something like this:
scala has few keywords
scala reuses the same few constructs (e.g. PartialFunction syntax is used as the body of a catch block)
scala has a few simple rules which allow you to create library code (which may appear as if the language has special keywords/constructs). I'm thinking here of implicits; methods containing colons; allowed identifier symbols; the equivalence of X(a, b) and a X b with extractors. And so on
scala's declaration-site variance means that the type system just gets out of your way. No more wildcards and ? super T
My personal opinion is that this argument is completely and utterly bogus. Scala's type system taken together with implicits allows one to write frankly impenetrable code for the average developer. Any suggestion otherwise is just preposterous, regardless of what the above "metrics" might lead you to think. (Note here that those who I've seen scoffing at the non-complexity of Java on Twitter and elsewhere happen to be uber-clever types who, it sometimes seems, had a grasp of monads, functors and arrows before they were out of short pants).
The obvious arguments against this are (of course):
you don't have to write code like this
you don't have to pander to the average developer
Of these, it seems to me that only #2 is valid. Whether or not you write code quite as complex as scalaz, I think it's just silly to use the language (and continue to use it) with no real understanding of the type system. How else can one get the best out of the language?
There is a myth that Scala is difficult because Scala is a complex language.
This is false--by a variety of metrics, Scala is no more complex than Java. (Size of grammar, lines of code or number of classes or number of methods in the standard API, etc..)
But it is undeniably the case that Scala code can be ferociously difficult to understand. How can this be, if Scala is not a complex language?
The answer is that Scala is a powerful language. Unlike Java, which has many special constructs (like enums) that accomplish one particular thing--and requires you to learn specialized syntax that applies just to that one thing, Scala has a variety of very general constructs. By mixing and matching these constructs, one can express very complex ideas with very little code. And, unsurprisingly, if someone comes along who has not had the same complex idea and tries to figure out what you're doing with this very compact code, they may find it daunting--more daunting, even, than if they saw a couple of pages of code to do the same thing, since then at least they'd realize how much conceptual stuff there was to understand!
There is also an issue of whether things are more complex than they really need to be. For example, some of the type gymnastics present in the collections library make the collections a joy to use but perplexing to implement or extend. The goals here are not particularly complicated (e.g. subclasses should return their own types), but the methods required (higher-kinded types, implicit builders, etc.) are complex. (So complex, in fact, that Java just gives up and doesn't try, rather than doing it "properly" as in Scala. Also, in principle, there is hope that this will improve in the future, since the method can evolve to more closely match the goal.) In other cases, the goals are complex; list.filter(_<5).sorted.grouped(10).flatMap(_.tail.headOption) is a bit of a mess, but if you really want to take all numbers less than 5, and then take every 2nd number out of 10 in the remaining list, well, that's just a somewhat complicated idea, and the code pretty much says what it does if you know the basic collections operations.
Summary: Scala is not complex, but it allows you to compactly express complex ideas. Compact expression of complex ideas can be daunting.
There is a myth that Scala is non-deployable, whereas a wide range of third-party Java libraries can be deployed without a second thought.
To the extent that this myth exists, I suspect it exists among people who are not accustomed to separating a virtual machine and API from a language and compiler. If java == javac == Java API in your mind, you might get a little nervous if someone suggests using scalac instead of javac, because you see how nicely your JVM runs.
Scala ends up as JVM bytecode, plus its own custom library. There's no reason to be any more worried about deploying Scala on a small scale or as part of some other large project as there is in deploying any other library that may or may not stay compatible with whichever JVM you prefer. Granted, the Scala development team is not backed by quite as much force as the Google collections, or Apache Commons, but its got at least as much weight behind it as things like the Java Advanced Imaging project.
Myth:
def foo() = "something"
and
def bar = "something"
is the same.
It is not; you can call foo(), but bar() tries to call the apply method of StringLike with no arguments (results in an error).
Some common misconceptions related to Actors library:
Actors handle incoming messages in a parallel, in multiple threads / against a thread pool (in fact, handling messages in multiple threads is contrary to the actors concept and may lead to racing conditions - all messages are sequentially handled in one thread (thread-based actors use one thread both for mailbox processing and execution; event-based actors may share one VM thread for execution, using multi-threaded executor to schedule mailbox processing))
Uncaught exceptions don't change actor's behavior/state (in fact, all uncaught exceptions terminate the actor)
Myth: You can replace a fold with a reduce when computing something like a sum from zero.
This is a common mistake/misconception among new users of Scala, particularly those without prior functional programming experience. The following expressions are not equivalent:
seq.foldLeft(0)(_+_)
seq.reduceLeft(_+_)
The two expressions differ in how they handle the empty sequence: the fold produces a valid result (0), while the reduce throws an exception.
Myth: Pattern matching doesn't fit well with the OO paradigm.
Debunked here by Martin Odersky himself. (Also see this paper - Matching Objects with Patterns - by Odersky et al.)
Myth: this.type refers to the same type represented by this.getClass.
As an example of this misconception, one might assume that in the following code the type of v.me is B:
trait A { val me: this.type = this }
class B extends A
val v = new B
In reality, this.type refers to the type whose only instance is this. In general, x.type is the singleton type whose only instance is x. So in the example above, the type of v.me is v.type. The following session demonstrates the principle:
scala> val s = "a string"
s: java.lang.String = a string
scala> var v: s.type = s
v: s.type = a string
scala> v = "another string"
<console>:7: error: type mismatch;
found : java.lang.String("another string")
required: s.type
v = "another string"
Scala has type inference and refinement types (structural types), whereas Java does not.
The myth is busted by James Iry.
Myth: that Scala is highly scalable, without qualifying what forms of scalability.
Scala may indeed be highly scalable in terms of the ability to express higher-level denotational semantics, and this makes it a very good language for experimentation and even for scaling production at the project-level scale of top-down coordinated compositionality.
However, every referentially opaque language (i.e. allows mutable data structures), is imperative (and not declarative) and will not scale to WAN bottom-up, uncoordinated compositionality and security. In other words, imperative languages are compositional (and security) spaghetti w.r.t. uncoordinated development of modules. I realize such uncoordinated development is perhaps currently considered by most to be a "pipe dream" and thus perhaps not a high priority. And this is not to disparage the benefit to compositionality (i.e. eliminating corner cases) that higher-level semantic unification can provide, e.g. a category theory model for standard library.
There will possibly be significant cognitive dissonance for many readers, especially since there are popular misconceptions about imperative vs. declarative (i.e. mutable vs. immutable), (and eager vs. lazy,) e.g. the monadic semantic is never inherently imperative yet there is a lie that it is. Yes in Haskell the IO monad is imperative, but it being imperative has nothing to with it being a monad.
I explained this in more detail in the "Copute Tutorial" and "Purity" sections, which is either at the home page or temporarily at this link.
My point is I am very grateful Scala exists, but I want to clarify what Scala scales and what is does not. I need Scala for what it does well, i.e. for me it is the ideal platform to prototype a new declarative language, but Scala itself is not exclusively declarative and afaik referential transparency can't be enforced by the Scala compiler, other than remembering to use val everywhere.
I think my point applies to the complexity debate about Scala. I have found (so far and mostly conceptually, since so far limited in actual experience with my new language) that removing mutability and loops, while retaining diamond multiple inheritance subtyping (which Haskell doesn't have), radically simplifies the language. For example, the Unit fiction disappears, and afaics, a slew of other issues and constructs become unnecessary, e.g. non-category theory standard library, for comprehensions, etc..
I'm going to be giving a short (30-40 mins) lunch-time talk on Scala to technical staff at my company. I'd like some suggestions for what would be the most appropriate content. Most people attending will have experience in Java and/or C# (plus various other languages).
What are the key things to cover? I'd like to give a brief introduction to the Scala syntax so that people don't feel lost when looking at code examples. I'll also cover some of the history behind the language and its designers. What would help people get the most out of the talk?
People are almost certainly coming to talk to get an answer to the question, "Why should I use Scala?" Anything you can provide to help them answer that will be valuable.
Keep the discussion of the history and the personalities behind Scala to a minimum.
A whirlwind tour of the syntax is useful, but keep it short.
Spend a good chunk of the talk demonstrating examples and comparisons to Java. Show cases where Scala shines. You should literally be running and executing code so that people get a real, hands-on feel for how things work.
Make sure to cover weaknesses, too! Provide an objective and balanced overview.
I gave a similar talk - mostly to those with a Java background. I felt that taking a piece of real Java (about 30 lines) and iteratively adding scala features worked pretty well. The 30 lines of Java eventually ended up as 6 (six!) of scala. The point being (of course) that 6 lines are more readable and maintainable than 30.
I converted the scala to line-by-line Java equivalent and then introduced:
Type inference
Option
Closures
Pattern-matching (on lists)
Type aliases
Tail recursion
I found that this segment took quite a long time because the audience were very interested in the minutiae of scala's syntax (especially around function-expressions). Before undertaking the pattern-matching bit, I had a slide explaining the various things you could use in a match.
Tough. One has to balance the new and the familiar. For instance:
Talk about traits, how they differ from interfaces and multiple inheritance. Note that most methods in all of Scala collections can actually be found on the trait Traversable, which has a single abstract method: foreach.
Speak of functions and partial functions, show map/filter/foreach, and how they make use of functions.
Talk about pattern matching -- show how unapply is used to enable representation independence, while at the same time case classes make the common case easy.
Above all AVOID any topic that might be difficult to understand quickly, or you may waste time on them. For example of great topics I wouldn't talk about: self types, variance, for-comprehensions.
Pick more topics than you have time for. Let the public steer the conversation towards the topcis they are more interested in. If anyone starts to boggle down a topic too much, say you'll be pleased to explain it in more details later, and ask if they would mind if you moved to another topic. On the other hand, if everyone seems to be picking up on one thing in particular, stay with it. Otherwise, it might feel like you want to hide something.
I gave a presentation on re-writing Java classes in Scala. It has lots of examples of Java -> Scala and (hopefully) makes the gains obvious. Feel free to borrow any content you want... presentation took 1hr 10minutes so you might want to cut some stuff out.
Presentation: http://www.colinhowe.co.uk/downloads/rewriting-java-in-scala.ppt
Video: http://skillsmatter.com/podcast/java-jee/re-writing-java-classes-in-scala-and-making-your-code-lovely
You could do worse than running through Jonas Bonér's presentation, Pragmatic Real-World Scala. Perhaps skip some advanced topics in there on different applications of traits and self-type annotations.
This question already has answers here:
Closed 13 years ago.
Possible Duplicate:
C# 'var' keyword versus explicitly defined variables
EDIT:
For those who are still viewing this, I've completely changed my opinion on var. I think it was largely due to the responses to this topic that I did. I'm an avid 'var' user now, and I think its proponents comments below were absolutely correct in pretty much all cases. I think the thing I like most about var is it REALLY DOES reduce repetition (conforms to DRY), and makes your code considerably cleaner. It supports refactoring (when you need to change the return type of something, you have less code cleanup to deal with, and NO, NOT everyone has a fancy refactoring tool!), and anecdotally, people don't really seem to have a problem not knowing the specific type of a variable up front (its easy enough to "discover" the capabilities of a type on-demand, which is generally a necessity anyway, even if you DO know the name of a type.)
So here's a big applause for the 'var' keyword!!
This is a relatively simple question...more of a poll really. I am a HUGE fan of C#, and have used it for over 8 years, since before .NET was first released. I am a fan of all of the improvements made to the language, including lambda expressions, extension methods, LINQ, and anonymous types. However, there is one feature from C# 3.0 that I feel has been SORELY misused....the 'var' keyword.
Since the release of C# 3.0, on blogs, forums, and yes, even Stackoverflow, I have seen var replace pretty much every variable that has been written! To me, this is a grave misuse of the feature, and leads to very arbitrary code that can have many obfuscated bugs due to the lack in clarity of what type a variable actually is.
There is only a single truly valid use for 'var' (in my opinion at least). What is that valid use, you ask? The only valid use is when you are incapable of knowing the type, and the only instance where that can happen:
When accessing an anonymous type
Anonymous types have no compile-time identity, so var is the only option. It's the only reason why var was added...to support anonymous types.
So...whats your opinion? Given the prolific use of var on blogs, forums, suggested/enforced by tools like ReSharper, etc. many up and coming developers will see it as a completely valid thing.
Do you think var should be used so prolifically?
Do you think var should ever be used for anything other than an anonymous type?
Is it acceptable to use in code posted to blogs to maintain brevity...terseness? (Not sure about the answer this one myself...perhaps with a disclaimer)
Should we, as a community, encourage better use of strongly typed variables to improve code clarity, or allow C# to become more vague and less descriptive?
I would like to know the communities opinions. I see var used a lot, but I have very little idea why, and perhapse there is a good reason (i.e. brevity/terseness.)
var is a splendid idea to help implement a key principle of good programming: DRY, i.e., Don't Repeat Yourself.
VeryComplicatedType x = new VeryComplicatedType();
is bad coding, because it repeats VeryComplicatedType, and the effects are all negative: more verbose and boilerplatey code, less readability, silly "makework" for both the reader and the writer of the code. Because of all this, I count var as a very useful enhancement in C# 3 compared to Java and previous versions of C#.
Of course it can be mildly misused, by using as the RHS an expression whose type is not clear and obvious (e.g., a call to a method whose declaration may be far away) -- such misuse may decrease readability (by forcing the reader to hunt for the method's declaration or ponder deeply about some other subtle expression's type) instead of increasing it. But if you stick to using var to avoid repetition, you'll be in its sweet spot, and no misuse.
I think it should be used in those situations where the type is clearly specified elsewhere in the same statement:
Dictionary<string, List<int>> myHashMap = new Dictionary<string, List<int>>();
is a pain to read. This could be replaced by the following with no loss of clarity:
var myHashMap = new Dictionary<string, List<int>>();
Pop quiz!
What type is this:
var Foo = new string[]{"abc","123","yoda"};
How about this:
var Bar = {"abc","123","yoda"};
It takes me roughly no longer to determine what types those are than with the explicity redundant specification of the type. As a programmer I have no issues with letting a compiler figure out things that are obvious for me. You may disagree.
Cheers.
Never say never. I'm pretty sure there are a bunch of questions where people have expounded their views on var, but here's mine once more.
var is a tool; use it where it's appropriate, and don't use it when it's not. You're right that the only required use of var is when addressing anonymous types, in which case you have no type name to use. Personally, I'd say any other use has to be considered in terms of readability and laziness; specifically, when avoiding use of a cumbersome type name.
var i = 5;
(Laziness)
var list = new List<Customer>();
(Convenience)
var customers = GetCustomers();
(Questionable; I'd consider it acceptable if and only if GetCustomers() returns an IEnumerable)
Read up on Haskell. It's a statically typed language in which you rarely have to state the type of anything. So it uses the same approach as var, as the standard "idiomatic" coding style.
If the compiler can figure something out for you, why write the same thing twice?
A colleague of mine was at first very opposed to var, just as you are, but has now started using it habitually. He was worried it would make programs less self-documenting, but in practice that's caused more by overly long methods.
var MyCustomers = from c in Customers
where c.City="Madrid"
select new { c.Company, c.Mail };
If I need only Company and Mail from Customers collection. It's nonsense define new type with members what I need.
If you feel that giving the same information twice reduces errors (the designers of many web forms that insist you type in your email address twice seem to agree), then you'll probably hate var. If you write a lot of code that uses complicated type specifications then it's a godsend.
EDIT: To exapand this a bit (incase it sounds like I'm not in favour of var):
In the UK (at least at the time I went), it was standard practice to make Computer Science students learn how to program in Standard ML. Like other functional languages it has a type system that puts languages in the C++/Java mould to shame.
Anyway, what I noticed at the time (and heard similar remarks from other students) was that it was a nightmare to get your SML programs to compile because the compiler was so increadibly picky about types, but once the did compile, they almost always ran without error.
This aspect of SML (and other functional languages) seems to be one the questioner sees as a 'good thing' - i.e. that anything that helps the compiler catch more errors at compile time is good.
Now here's the thing with SML: it uses type inference exclusively when assigning. So I don't think type inference can be inherently bad.
I agree with others that var eliminates redundancy. I have decided to use var where it eliminates redundancy as much as possible. I think consistency is important. Choose a style and stick with it through a project.
As Earwicker indicated, there are some functional languages, Haskell being one and F# being another, where such type inference is used much more pervasively -- the C# analogy would be declaring the return types and parameter types of methods as "var", and then having the compiler infer the static type for you. Static and explicit typing are two orthogonal concerns.
In fact, is it even correct to say that use of "var" is dynamic typing? From what I understood, that's what the new "dynamic" keyword in C# 4.0 is for. "var" is for static type inference. Correct me if I am wrong.
I must admit when i first saw the var keyword pop up i was very skeptical.
However it is definitely an easy way to shorten the lines of a new declaration, and i use it all the time for that.
However when i change the type of an underlying method, and accept the return type using var. I do get the occasional run time error. Most are still picked up by the compiler.
The secound issue i run into is when i am not sure what method to use (and i am simply looking through the auto complete). IF i choose the wrong one and expect it to be type FOO and it is type BAR then it takes a while to figure that out.
If i had of literally specified the variable type in both cases it would have saved a bit of frustration.
overall the benefits exceed the problems.
I have to dissent with the view that var reduces redundancy in any meaningful way. In the cases that have been put forward here, type inference can and should come out of the IDE, where it can be applied much more liberally with no loss of readability.