I have been using CDH and HDP for a while (both in the pseudo-distributed mode) on a VM as well as installing natively on Ubuntu. Although my question is probably relevant to all Projects within the Apache Hadoop Ecosystem, let me ask this specifically in the context of Avro.
What is the best way to go about figuring out what the different packages and the classes within the packages do. I usually end up referring to the Javadoc for the project (Avro in this case) but the overviews for packages and classes end up being awfully inadequate.
For e.g. Take two of the Avro packages: org.apache.avro.specific and org.apache.avro.generic These are used for creating Specific and Generic Readers and Writers (respectively) but I'm not a 100% sure what these are for. I have used the Specific Package for in cases when I have used Avro Code Generation and the Generic ones when I don't want to use code generation. However, I am not sure if that is the only reason for using one vs. the other.
Another example: The Encoder\Decoder Classes are used for low-level SerDe, the DatumReader\DatumWrite for a "medium-level" Serde while most application layer interactions with Avro will probably use Generic\Specific Readers\Writers. Without having struggled through the pain of using these classes, how is a user to know what to use for what?
Is there a better way to get a good overview of each package (clearly the javadoc is not well documented) and the classes within the package?
PS: I have similar questions for essentially all other Hadoop Projects (Hive, HBASE etc.) - the Javadocs seem to be grossly inadequate overall. I just wonder what other developers end up doing to figure these out.
Any inputs would be great.
I download the source code and skim through it to get the idea what it does. If there is javadoc, I read that too. I tend to concentrate on the interfaces that I need and move on from there, that way I put everything into context and it makes it easier to figure out the usage. I use the call hierarchy and the type hierarchy views a lot.
These are very general guidelines, and ultimately it is the time you spend with the project that will make you understand it.
Hadoop ecosystem is quickly growing and changes are introduced on monthly bases. that's why javadoc is not so good. Another reason is that hadoop software tends to lean towards the infrastructure and not towards the end user. People developing tools will spend time learning the APIs and internals while everybody else is kinda supposed to be blissfully ignorant of all those, and just use some high level domain specific language for the tool.
I found many options recently, and interesting in their comparisons primarely by maturity and stability.
Crunch - https://github.com/cloudera/crunch
Scrunch - https://github.com/cloudera/crunch/tree/master/scrunch
Cascading - http://www.cascading.org/
Scalding https://github.com/twitter/scalding
FlumeJava
Scoobi - https://github.com/NICTA/scoobi/
As I'm a developer of Scoobi, don't expect an unbiased answer.
First of all, FlumeJava is an internal google project that provides a (awesomely productive) abstraction ontop of MapReduce (not hadoop though). They released a paper about it, which is what projects like Scoobi and Crunch are based on.
If your only criteria is the maturity -- I guess Cascading is your best bet.
However, if you're looking for the (imho superior) FlumeJava style abstraction, you'll want to pick between (S)crunch and Scoobi.
The biggest difference, superficial as it may be is that crunch is written in Java, with Scala bindings (Scrunch). And Scoobi is written in Scala with Java bindings (scoobij). They're both really solid choices, and you won't go wrong which ever you choose. I'm sure there's quite a similar story with Crunch, but Scoobi is being used in real projects and is under continual development. We're pretty very active in fixing bugs and implementing features.
Anyway, they're both great projects with great people behind them and were both released within days of each other. They provide the same abstraction (with similiar api), so switching between the two won't be an issue in the slightest. My recommendation is to give them both a try, and see what works for you. There' no lock in in either project, so you don't need to commit :)
And if you have any feedback for either project, please be sure to provide it :)
I'm a big Scoobi fan myself and I've used it in production. I like the way it allows you to write type-safe Hadoop programs in a very idiomatic Scala way. If that is not necessarily your thing and you like the Cascading model but are scared off by the huge amount of boilerplate code you'd have to write, Twitter has recently open sourced its own Scala abstraction layer on top of Cascading called Scalding.
Announcement: https://dev.twitter.com/blog/scalding
GitHub: https://github.com/twitter/scalding
I guess it's all a matter of taste at this point since feature-wise most of the frameworks are very close to one another.
Scalding also has the advantage of significant open source projects built atop it, such as Matrix API and Algebird.
Here are some examples:
http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html
Cascalog was released almost two years before Scalding, and arguably has more advanced features for building robust workflows:
https://github.com/nathanmarz/cascalog/wiki
I'm learning Scala now. I saw there are 2 test frameworks there, ScalaTest and Specs. My only problem is that I'm not still at ease with the language to decide which is better.
Also I'm used to write tests before code, at the moment I have no clear idea how to do it in functional programming.
Ideally I'd like to learn Scala in a TDD fashion, is there any resource about it?
There is a functional koan which might be something you are looking for.
Dick Wall from the Java Posse started a github project:
https://github.com/relevance/functional-koans/tree/scala
You need maven to start it via mvn package.
There is another Koan:
http://www.scalakoans.org/
Thanks to #MikeHoss!
So, test frameworks. There are other questions about that, though I'd like to point out that there's also ScalaCheck. ScalaCheck is not as fully features as Specs and ScalaTest, but, on the other hand, both Specs and ScalaTest can integrate with it.
Personally, I'd rather go with ScalaCheck, which is likely very different from the unit testing frameworks you are used to. This difference can be good in keeping you from stating tests in an object oriented manner.
Now, to the main concern of your question: is there a TDD-like Scala tutorial? I don't know of any, though the answer about functional koans appear to approach what you want.
ScalaTest is the more richly-featured and flexible of the two frameworks.
Having said that... I'm currently favouring Specs, they seem to be doing a far better job of keeping up with the latest Scala releases and the IntelliJ integration also seems to work better.
Specs also has the advantage, for you, of having a smaller API to learn.
Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource are off-topic for Stack Overflow as they tend to attract opinionated answers and spam. Instead, describe the problem and what has been done so far to solve it.
Closed 8 years ago.
Improve this question
I've just started learning Scala, and the first thing I'm going to implement is a tiny web application. I've been using Erlang for the last year to implement server-side software, but I've never wrote web applications before. It will be a great experience.
Are there web-frameworks for Scala except for Lift?
Don't get me wrong, Lift looks awesome. I just want to know how many frameworks there are so that I can then choose between them. It's always a good to have a choice, but I the only thing I found was Lift.
I'm very interested in Scala, but I have not used it yet, so with that caveat, the frameworks I am aware of that are not mentioned in HRJ's answer (Lift, Sweet, Slinky) are:
Scalatra, previously Step (on GitHub)
Play 2 (on GitHub)
Pinky
I wrote a blog post about this.
To summarise, some of the options are:
Lift
Sweet
Slinky
I finally found that none were suitable for me, and developed my own little "framework". (It is not open-source yet).
I like Lift ;-)
Play is my second choice for Scala-friendly web frameworks.
Wicket is my third choice.
Following is a dump of frameworks. It doesn't mean I actually used them:
Coeus. A traditional MVC web framework for Scala.
Unfiltered. A toolkit for servicing HTTP requests in Scala.
Uniscala Granite.
Gardel
Mondo
Amore. A Scala port of the Ruby web framework Sinatra
Scales XML. Flexible approach to XML handling and a simplified way of interacting with XML.
Belt. A Rack-like interface for web applications built on top of Scalaz-HTTP
Frank. Web application DSL built on top of Scalaz/Belt
MixedBits. A framework for the Scala progamming language to help build web sites
Circumflex. Unites several self-contained open source projects for application development using the Scala programming language.
Scala Webmachine. Port of Basho's webmachine in Scala, a REST-based system for building web applications
Bowler. A RESTful, multi-channel ready Scala web framework
Try Play Framework, which also support Scala.
One very interesting web framework with commercial deployment is Scalatra, inspired by Ruby's Sinatra. Here's an InfoQ article about it.
I find Unfiltered very interesting https://github.com/unfiltered/unfiltered.
It's mentioned in IttayD's list.
Here is a presentation about it http://unfiltered.lessis.me/#0
and the video http://code.technically.us/post/942531598/doug-tangren-presents-the-unfiltered-toolkit-for
Also here there is an article with more info http://code.technically.us/post/998251172/holding-the-parameter
It must be noted that there is also a considerable interest in Wicket and Scala. Wicket fits Scala suprisingly well. If you want to take advantage of the very mature Wicket project and its ecosystem (extensions) plus the concise syntax and productivity advantage of Scala, this one may be for you!
See also:
Some prosa
Presentation
Some experience with Wicket and Scala
Announcments with reference to the project for the glue code to bind Scala closures to models
Play is pretty sweet.
It is now production ready. It incorporates: a cool template framework,automatic reloading of source files upon safe, a composable action system, akka awesomeness, etc.
Its part of the Typesafe Stack.
Having used it for two projects, I can say that it works pretty smoothly and it should be something to consider next time you are looking to learn new web frameworks.
I tend to use JAX-RS using Jersey (you can write nice resource beans in Scala, Java or Groovy) to write RESTul web applications. Then I use Scalate for the rendering the views using one of the various template languages (JADE, Scaml, Ssp (Scala Server Pages), Mustache, etc.).
There's a new web framework, called Scala Web Pages. From the site:
Target Audience
The Scala Pages web framework is likely to appeal to web programmers who come from a Java background and want to program web applications in Scala. The emphasis is on OOP rather than functional programming.
Characteristics And Features
Adheres to model-view-controller paradigm
Text-based template engine
Simple syntax: $variable and <?scp-instruction?>
Encoding/content detection, able to handle international text encodings
Snippets instead of custom tags
URL Rewriting
Prikrutil, I think we're on the same boat. I also come to Scala from Erlang. I like Nitrogen a lot so I decided to created a Scala web framework inspired by it.
Take a look at Xitrum. Its doc is quite extensive. From README:
Xitrum is an async and clustered Scala web framework and web server on top of Netty and Hazelcast:
It fills the gap between Scalatra and Lift: more powerful than Scalatra and easier to use than Lift. You can easily create both RESTful APIs and postbacks. Xitrum is controller-first like Scalatra, not view-first like Lift.
Annotation is used for URL routes, in the spirit of JAX-RS. You don't have to declare all routes in a single place.
Typesafe, in the spirit of Scala.
Async, in the spirit of Netty.
Sessions can be stored in cookies or clustered Hazelcast.
jQuery Validation is integrated for browser side and server side validation.
i18n using GNU gettext, which means unlike most other solutions, both singular and plural forms are supported.
Conditional GET using ETag.
Hazelcast also gives:
In-process and clustered cache, you don't need separate cache servers.
In-process and clustered Comet, you can scale Comet to multiple web servers.
Follow the tutorial for a quick start.
There's also Pinky, which used to be on bitbucket but got transfered to github.
By the way, github is a great place to search for Scala projects, as there's a lot being put there.
I'd like to add my own efforts to this list. You can find out more information here:
brzy framework
It's in early development and I'm still working on it aggressively. It includes features like:
A focus on simplicity and extensibility.
Integrated build tool.
Modular design; some initial modules includes support for scalate, email, jms, jpa, squeryl, cassandra, cron services and more.
Simple RESTful controllers and actions.
Any and all feedback is much appreciated.
UPDATE: 2011-09-078, I just posted a major update to version 0.9.1. There's more info at http://brzy.org which includes a screencast.
Both Sweet and Slinky seem to be unmaintanted for about a year. Sweet Maven repo sweetsoftwaredesign.com is dead so there's even no way to download dependencies.
Note: Spiffy is outdated.
<plug>
Spiffy:
is written in Scala
uses the fantastic Akka library and actors to scale
uses servlet API 3.0 for asynchronous request handling
is modular (replacing components is straight forward)
uses DSLs to cut down on code where you don't want it
supports Scalate and Freemarker for templating
Spiffy is a web framework using Scala, Akka (a Scala actor implementation), and the Java Servlet 3.0 API. It makes use of the the async interface and aims to provide a massively parallel and scalable environment for web applications. Spiffy's various components are all based on the idea that they need to be independent minimalistic modules that do small amounts of work very quickly and hand off the request to the next component in the pipeline. After the last component is done processing the request it signals the servlet container by "completing" the request and sending it back to the client.
https://github.com/mardambey/spiffy
</plug>
You could also try Context. It was designed to be a Java-framework but I have successfully used it with Scala also without difficulties. It is a component based framework and has similar properties as Lift or Tapestry.
I have stumbled upon your question a few weeks back, but since then also learned about Circumflex. This is a nice, minimal framework that is therefore easy to learn, and it has pretty good documentation available as well.
Beside it's minimal-ness, it also claims to work well with other libraries and lets you use your own implementation of things when you need it.
As a academic project of 6 months in college me and my 3 friends are going to implement "Distributed Caching" in scala language.
Being new to both of these concepts and this being our first project I would be really happy if you guys could provide some direction.
I am currently learning scala.
Please let me know which particular features of language to be learned for this particular project.
Any online resources for learning distributed caching.
thanks in advance
You could have a look at Terracotta and especially at its uses in implementing Distributed Caching. You could have a look at the source code of the open source edition of Terracotta. Also, you could even consider Terracotta as your framework for building the distributed cache. I don't have any personal experience in using Terracotta with Scala, but it has been done.
Features of the language... Try starting with the Programming in Scala book. It's a very good resource. If you want to do any concurrency you will have to be proficient in using Actors. I would recommend having a look over all the features of Scala. Each one has its uses and you will need to know at least a bit of them to recognise situations in which to use their power. :)
-- Flaviu Cipcigan
You might want to look at the project Velocity page.
In MSDN also there is an article about distributed caching in general.
I'm not sure, but I think the Akka project might is already doing what you're looking for (and a whole lot more). Perhaps you can take inspiration from that.