Related
I found many options recently, and interesting in their comparisons primarely by maturity and stability.
Crunch - https://github.com/cloudera/crunch
Scrunch - https://github.com/cloudera/crunch/tree/master/scrunch
Cascading - http://www.cascading.org/
Scalding https://github.com/twitter/scalding
FlumeJava
Scoobi - https://github.com/NICTA/scoobi/
As I'm a developer of Scoobi, don't expect an unbiased answer.
First of all, FlumeJava is an internal google project that provides a (awesomely productive) abstraction ontop of MapReduce (not hadoop though). They released a paper about it, which is what projects like Scoobi and Crunch are based on.
If your only criteria is the maturity -- I guess Cascading is your best bet.
However, if you're looking for the (imho superior) FlumeJava style abstraction, you'll want to pick between (S)crunch and Scoobi.
The biggest difference, superficial as it may be is that crunch is written in Java, with Scala bindings (Scrunch). And Scoobi is written in Scala with Java bindings (scoobij). They're both really solid choices, and you won't go wrong which ever you choose. I'm sure there's quite a similar story with Crunch, but Scoobi is being used in real projects and is under continual development. We're pretty very active in fixing bugs and implementing features.
Anyway, they're both great projects with great people behind them and were both released within days of each other. They provide the same abstraction (with similiar api), so switching between the two won't be an issue in the slightest. My recommendation is to give them both a try, and see what works for you. There' no lock in in either project, so you don't need to commit :)
And if you have any feedback for either project, please be sure to provide it :)
I'm a big Scoobi fan myself and I've used it in production. I like the way it allows you to write type-safe Hadoop programs in a very idiomatic Scala way. If that is not necessarily your thing and you like the Cascading model but are scared off by the huge amount of boilerplate code you'd have to write, Twitter has recently open sourced its own Scala abstraction layer on top of Cascading called Scalding.
Announcement: https://dev.twitter.com/blog/scalding
GitHub: https://github.com/twitter/scalding
I guess it's all a matter of taste at this point since feature-wise most of the frameworks are very close to one another.
Scalding also has the advantage of significant open source projects built atop it, such as Matrix API and Algebird.
Here are some examples:
http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html
Cascalog was released almost two years before Scalding, and arguably has more advanced features for building robust workflows:
https://github.com/nathanmarz/cascalog/wiki
Despite having read "Programming in Scala" several times, I still often finds important Scala constructs that were not explained in the book, like
#uncheckedVariance
#specialized
and other strange constructs like
new { ... } // No class name!
and so on.
I find this rather frustrating, considering that the book was written by the Scala "inventor" himself, and others.
I tried to read the language specification, but it's made for academics, rather than practicing programmers. It made my head spin.
Is there a website for "Everything "Programming in Scala" Didn't Tell You" ?
There was the daily-scala Blog, but it died over a year ago.
Currently, we're working on a central documentation site for scala-lang.org. We're hoping that this solves a lot of the documentation issues that new users face. More details on this effort can be found at http://heather.miller.am/blog/2011/07/improving-scala-documentation/, but in summary...
Believe it or not, there are a lot of documents that the Scala team has produced but which simply aren't in HTML or are otherwise difficult to find. Such as Martin's new Collections API, his document on Arrays, or Adriaan's on Type Constructor Inference.
One goal of such a site is to collect all of this documentation in one place, in a searchable, organized, and easy-to-navigate format.
Another goal is to collect excellent community documentation out there, and to put it in the same place as well. For that, we are actively looking for quality (article/overview-like) material with maintainers. Examples include the Scala Style Guide, and Daniel Spiewak's Scala for Java Refugees.
Yet another goal is to make it easy for contributors to participate- so the site is built from RST source, which will live in a documentation-only github repo at https://github.com/scala/scala-docs.
So, in short, something better is on it's way, and contributors are very welcome to participate.
EDIT: http://docs.scala-lang.org is now live.
Several documents considered to be rather detailed or even obscure are already available. This includes all "Scala Improvement Proposals" (the proposals produced when new language features are suggested, and which are usually very detailed, and written by the implementers themselves). Also available is the entire glossary from Programming in Scala, Scala cheatsheets, amongst many other documents. The bottom-line of the site is to be community-focused and contribution-friendly-- so, free, and totally open. Suggested topics to cover are also welcome.
Take a look at scalaz and typelevel librairies (shapeless, spire, etc.), they rely on many advanced features of Scala.
*scalaz was for a time part of typelevel, but it is no more the case.
Josh Sureth's book goes a little beyond the usual. It's not as far as I'd like but I'm not his core audience - still, there's a lot of good stuff in there.
http://www.manning.com/suereth
Scala IRC: irc://irc.freenode.net/scala
Scala forum: http://scala-forum.org/
Blogs: Just look at http://planetscala.com/
Programming Scala (Wampler, Payne): http://ofps.oreilly.com/titles/9780596155957/
Programming in Scala (Odersky, Venners, Spoon) - good but Scala 2.8: http://www.artima.com/pins1ed/
The new documentation page is online:
http://docs.scala-lang.org/
I've kept a library of advanced Scala resources, primarily talks and blog posts. It's updated pretty regularly as I find new, interesting content.
Happy to add new links to it if anyone has recommendations.
Try to read SBT Source: https://github.com/harrah/xsbt/wiki
Its a good exercise. Also check out the book 'scala in depth' : http://www.manning.com/suereth/ by
Joshua D. Suereth
I believe there are a lot of good answer here. But as a sharing of experience. I have been coding Scala for 2 year (not my full time job), and been progressively better at it. My project is 97% Scala, and I have been able to do most of it with:
Programming Scala
The scala-user list
Stackoverflow
This cover most of the need for the "user" side of Scala, meaning all you need to create working application. However if you want to write some more complex code, or create powerful typed libraries you definitely need more.
If you want to go beyond the basics and are prepared to delve deeply into type system, and libraries, then the alternatives I use:
Use the community, scala enthusiast are really nice. I have worked with folks form Specs, Scalaz and Lift.
IRC is really good and some of the core contributors to some of the big library frequently show up.
Jump to source code, but don't try to understand everything. Scala type system can be daunting, however you normally don't need to understand 100% of it to use it.
If you really need to get into the nitty gritty details, hit the language specs, development list, and get to know the key people.
However you can really be very effective in Scala without needing to understand every single bit of the language.
I'm net-java developer with some small projects implemented. I'm going to start a new project which is portal with many typical features (posts, comments, messaging, users, catalog, news, galleries, etc).
I believe the best solution would be use any mature CMS (joomla, drupal...) and customize it where needed. The problem is that I'm not familiar with PHP (CMS written on PHP has far better set of features, plugins, community, information I believe) I'm not planning to learn PHP, I want to improve my java-net skills.
So the question for me is:
write all by myself, improving my programming skills and risking to finish my project in relatively long period
on the other hand
I could spend some time learning tools and languages, which I think while, I don't need in the future and more likely finish my project in some shorter time
what would you advise?
Learning another language will not hurt you, and as most of the differences are in syntax and supporting libraries, you would be surprised at how quickly you can pick up a new language.
Your choice should be on what language is best for the task, not simply the one you know.
So, my suggestion is - learn PHP and go with a mature CMS.
A LOT of effort goes into developing a CMS, so writing your own will likely take some time. Put together a project plan and work out how long it would take you to develop something from scratch, then do some research on existing CMS packages and how they fit your needs.
I'm a .NET developer but have used Joomla in the past - it's actually quite easy to put together a website even if you're not too familiar with PHP.
Better yet, find a CMS package in your preferred language - they oughta be some out there.
i.e.
http://java-source.net/open-source/content-managment-systems
Learning new tools is seldom a waste of time. Especially not when it comes to such well known and world wide spread languages such as php. I would say it's best using the tools most suited for the project you're up to, rather than reinventing the wheel.
You should take a good look at your requirements. If you're sure you can get them all from a CMS, that makes most sense. Take a good look at the compatibility and reliability of all components.
Otherwise, you might be better served by a .net or java CMS.
Writing your own CMS without having extensive experience in the current available ones is not going to lead to a good result, except for you learning some programming skills.
If you don't have a "due date" for your project write it by yourself.
Or take a look at http://www.opencms.org/en/ ;)
There are a lot of opensource CMS written in Java ;)
Even if you don't write in PHP again, the benefit of knowing another programming language is going to give you some valuable perspective on net-java.
The task of learning a new language is going to be an asset. Learning to learn something. Identifying what you need to know and how to find the answers is a transferable skill.
Your job will be to complete your project in the most efficient manner with the highest quality output practical. Use the tool that is going to best help you achieve this. The language it's written in should be largely irrelevant.
Now i know that this one is actually not a very technical question but one that has been bothering me for some time. Actually we are using a lot of C++ and PHP at our company and some of our developers are really hoping for a new and modern language to come by to help us getting more productive. I have been talking about what scala can do and the other coders seem to gain some interest in the language. The tough job is, how do you convince your boss to consider scala as a language for the company. I saw the presentation "Sneaking Scala into your company", but it deals with the situation that you are using Java at your company which we don't.
How do you fight of the usual "that is just esoteric stuff" and "we can already do that in $LANGUAGE" arguments. I was planing to give a talk about Scala, and since I don't have much time I need ideas how to get people interested in the language rather then setting of reactions like "currying? we can already do something like this with boost::bind".
How did you guys do it?
Regards,
raichoo
EDIT: Gave my talk yesterday, people were very excited. My company is going to give it a try! Thanks for all your suggestions.
If you don't already have killer arguments, what are you basing your reasoning on that Scala will make your company more productive?
Don't like something then hunt for reasons to use it at work. Let the reasons speak for themselves..
"A hammer looking for nails"
Using it to do some stuff around the side, as datamigrations, testing and similar things will make sure the necessary experience is built and can give it some exposure.
ScalaTest is really nice to help with acceptance/integration testing. (Yes, I know it is nice for unit testing, but I do not see that immediately happening with C++/PHP target code, and it would probably be unwise).
Proof of Concept and other Prototypes are great for 2 reasons
1) It showcases the capabilities
2) You are certain they will be thrown away if you have to reimplement them in C++/PHP
Now a bad time to introduce Scala would be when you REALLY need it : hopes will be high, it will not immediately work as intended, hopes are dashed and everybody will blame Scala. As a result it will be burnt for a long time in the organisation.
Sooner or later some suit will think it was his idea to introduce Scala and use it on a formal project. If that project is moderately successful, then it is sold.
These kind of changes are complicated people issues, and the harder you push, the harder you will face push-back. On the other hand the persistent mind can move mountains.
Redo some of your work related code in Scala and compare KLOC, code structure and performence, if it looks and works better, show it to your peers and your managers.
In other words:
Talk is cheap. Show me the code.
-- Torvalds, Linus (2000-08-25)
In case of our company (and I assume, many companies share the same scenario), move to Scala (from Java) was initiated by tech people, who 1. wanted to work more productive writing code (living in the 21st century utilize modern approaches), 2. have less troubles building concurrent applications (Actors concept promoted by Scala is a way simpler than Java thread-based concurrency) 2.1 have a simpler way of building scalable staged event driven architectures.
In our company, transition to Scala was more or less simple, because Scala was literlly sold to business people as a library to Java :) -> from their POV, we're still using the same platform (JVM), application servers, etc., but developers are having more fun from their work, and therefore, are more inspired and work more efficiently.
Maybe you could pitch Scala by showing off the suite of tools that is used for development? For example, if you are not already using Eclipse in your company, show your execs a demo of what a modern IDE can do for your productivity.
There is a book called "Fearless Change" (Linda Rising) that describes a pattern language for "powerless leaders" (I LOVE that role title!). SE-radio had a really motivating interview with the author: http://www.se-radio.net/podcast/2009-06/episode-139-fearless-change-linda-rising. Listen up on that interview to collect a few non-technical strategies that can help you in this struggle!
I haven't used Scala yet for any real business code, but I know people who have.
One group used it to write a tool to analyze log files. So they didn't use it for mission-critical business code, but for a non-critical tool to support the project.
Another person I know is an architect and he just went and wrote some Scala code on his own for some production code without telling his manager. After the code was deployed successfully he did tell it. One of the things he mentioned is that because Scala runs on the JVM, the people who support the application don't even notice - to them, Scala is just another library that's included with the application (they were already used to the JVM). Ofcourse this approach is risky and not everybody will be in the position or be willing to do this.
You could start small - use it as your personal preferred scripting language for small things that you need yourself. Tell your fellow developers about it and make them enthusiasts too. If they also start using it then you can step it up to make some side code for your project (such as for example that log analyser tool).
This isn't a really easy task. I would concentrate on the fact that you will be able to produce code and therefore products faster and with a higher quality. That's always the two reasons, business wants to hear from you and will listen to.
Maybe you can show an example of 1-2 very small projects you did in your company with C++/PHP and compare the effort, quality etc. with a similar/the same implemenation in Scala? This would be very impressive and should also convince people who are not on the coding side.
There was a very good talk at Scala Days 2010 by David Copeland:
Sneaking Scala into your organisation
The executive summary: Testing. You can use Scala for testing without affecting release code.
we are considering Scala for a new Project within our company. We have some Junior Programmers with only PHP knowledge, and we are in doubt that they can handle Scala. What are your opinions? Some say: "Scala is a complicated beast!", some say: "It's easy once you got it." Maybe someone has real-world experience?
"My coworkers will not understand Scala" is simultaneously overstating its difficulty and insulting your coworkers.
Scala is not that difficult. It's just another programming language. Any trouble that junior programmers have with Scala is going to be more or less the same trouble they would have with any other language.
Your coworkers are smart. Of course, I don't know them, but it's a pretty safe bet unless your company is the kind of organisation that hires stupid people, in which case, you have bigger problems.
That said, at my company we have some core products developed in Scala, and we don't find that people have any more trouble with it than Java. The code is generally more clear and concise, easier to generalise and reuse, etc.
I guess Scala could be used as a "beginners" language. Even though there are tricky ideas behind it, you dont have to use/explain them in the beginning. If you explain pure OO with Scala, I would say it is straight forward and easily understandable. As Scala reduces a lot of code overhead from other languages it might even be easier to learn concepts with Scala than with Java/C++.
A major drawback I see with Scala as a beginner language is the lack of documentation. Don't get this wrong, the official Scala doc is very good and also the few books that are available are quite useful to get the details of the language, but those have not been written for beginners. For example in Java you find hundreds of books titled something like "Learning OO with Java" you wont find that for Scala which may be a show stopper.
As Hannes mentioned, only do new language introductions within research projects and not productive or even flagship projects. If you have some juniors, that makes the situation even better, take some internal tool, you always wanted and needed and let them create it during a research project. This is also a nice opportunity to experiment with different development-processes. And your juniors most probably like to be challenged and will deliver a good prototype and a very well proofed opinion if Scala can be used as a beginner language.
I believe that most people moving to Scala are experienced and enthusiastic coders. I'd suggest that you get in-house experience with a Scala project with your senior programmers first before forming a strategy for mentoring your junior coders. I'd also suggest that you only involve people who are eager to join in.
I would advocate it. But with the proviso that you have clear guidelines on what language features are acceptable for your team. For example, coding primarily in an imperative style (which is familiar for Java-trained people), or perhaps limiting the employment of recursion or closures.
Also plan for seniors to mentor the juniors. This may take the form of any combination of: pair programming, code reviews, info sessions, regular discussion forums, etc.
The opportunity that scala presents for vastly improved coding on the JVM is too great to pass up. When your seniors get into it I would not be surprised if they find renewed passion for development. When your juniors get into it they will be learning best practice JVM development from the start.
If you choose to go this route, perhaps they'll find easier to use the Scalate framework than a more traditional one like Lift, since it allows mixing HTML with Scala, much in the same way as PHP works.
Scala is a very 'normal' programming language. Any programmer should be able to learn this language. The people that have difficulties learning Scala mostly are experienced with imperative languages and are surprised by the functional concepts. So unexperienced programmers may learn it even faster. In my opinion should be no problem, to assign it to juniors. From a management point of view, I would assign a junior and a senior developer as a team (or more of both, depending on the size of the project).
I think it depends whats more importent for you. If you want to learn as possible about OO Programming and the standard stuff its a bad idea.
But what you really give them is a opportunity to learn something really cool and unique. Witch can be good motivation. Scala has many cool stuff in it. If you can handle Scala you can handle a lot of otherstuff as well.
Talk to the Programmers (all of them) and tell them why you wanne to use scala. Ask if the have to motivation to make and learn something not everybody can do and go the extra mile?
If the are go with it!
My initial thought would be that Scala will be too heavy for them but then I guess because Scala is an OO/Imperitave/Functional hybrid, one could introduce them to the OO/Imperative part of scala until their comfortable, but then again they will probably have bad PHP habits in Scala since scala authors mostly prefer the functional style over the imperative one.
So, it could work, but I would do it for a research project, and definately not for a flagship one.
Edit: Perhaps this should be said also: It seems that functional/OO hybrids like Scala is becoming more popular especially because of how functional languages handles parallel processing as opposed to how we know it in languages like in Java. The amount of cores found in a chip is increasing rapidly, so this is important. However, mentioning PHP, it seems that you are developing web server scripts where threading is less important. PHP doesn't even have threads.
This raises another point. Do you want to develop Scala Web applications i.e. Lift. If so then you have a doubled up learning curve which should also be considered.
Imagine that you would have picked Java and asked whether they could handle Java. If your answer is they could, then they can probably handle Scala.
Scala is only marginally more difficult due to:
No great IDE support. The support ranges from poor to good. Not necessarily an issue for a PHP programmer.
Documentation not as rich as Java
Both Java and Scala have new challenges for a PHP programmer (JVM, new libraries, compiled language, statically typed).
I don't think Scala is a complicated beast, but you do need to understand some of the syntactic sugar and design principles, which would be true of learning Java as well.
Yes, if...
Strategic decision has been made to go with Scala
Company can handle the hit (financial and time) that will come from the steep learning curve.
No, if...
No senior Java, C# or C++ programmers can be put on the project too
Can't find a Scala programmer to act as a lead
Programmers don't have the patience to learn Scala or deal with a language where Jars (libraries) are scattered all over the place, rather than in one or two neat packages like PHP.
*Note: if the junior programmers were C++, C#, or Java Software Engineers rather than PHP, then my answer would be different like, Go for it!
I would not recommend it. My experience of Scala is only from homebrew projects, but I would imagine the currently lousy IDE support, quite frequent API changes and a very flexible syntax (that allows one to hang himself and everybody else participating in the project) would cause a lot of problems in a bigger, more official project.
Give them IntelliJ and throw 'em in the deep end.
Here is a blog post I recently stumbled upon:
http://james-iry.blogspot.com/2008/07/java-is-too-academic.html
It shows that even Java can be too academic to be understood by programmers which have no experience in functional programming. On the other hand, Scala allows to write code the "imperative way", so you can avoid all the FP stuff if you do not understand it. In my opinion, Scala is much more concise than Java, so I guess a "junior programmer" should be able to handle it.