The best way to use mongodb, redis and rabbit mq through karatedsl? - mongodb

Is there the best way for me to use mongodb, redis and rabbitmq through karatedsl ? or i have to write my own java code for them all ?

You have to write your own Java code, refer: https://github.com/intuit/karate#calling-java - and there is also a JDBC example as a reference: dogs.feature
The reason we don't support all databases is that it will un-necessarily add complexity and a learning curve to Karate that will needlessly burden the 90% of users who don't need to call a database (for those who are too lazy to write glue code to do so ;).
Please note that the code to get data from the database is something you need to write only one time, and I suggest that you take the help of someone to do this. Once you have this in place, you can re-use it in all the tests you create.
If you find this troublesome, please stop using Karate and switch to alternatives like: https://github.com/JakimLi/pandaria or https://github.com/zheng-wang/irontest - all the best :)

Related

Writing Custom Extensions in Druid

I am new to Druid.
Problem Statement
We do currently push raw event data to Druid. I have a requirement to apply certain calculations on the data (say like certain stat techniques) which are not supported by Druid or the extensions it provides out of the box.
There are two questions I have -
What would be a better way to achieve this? (Have some external script that reads data from Druid, computes the calculations and puts it back to Druid)?
Can I take a route of writing Custom Extensions on Druid? I could not find any good documentation on how do we go about writing/ testing Druid Extensions.
These link does not provide any in-depth information -
http://druid.io/docs/latest/development/modules.html
https://github.com/apache/incubator-druid (Druid repo that has some core and community contrib extensions)
Appreciate any help on this. Thank you.
You can achieve this both ways now it's up to you how much comfortable you are writing an extension by yourself and then maintain it. This is certainly time-consuming compared to another way.
If you read data from druid and then perform your calculation and write data back to the druid, you will end up writing to the separate table. If you are not storge bound on druid cluster then you can certainly take this path and its less time-consuming.
Yes, this is the recommended way to perform any custom computation on data. You can certainly write a simple extension easily. Here's the example git hub repo link which helps to write a custom druid extension: https://github.com/implydata/druid-example-extension

What is the best way to store and manipulate "list of json" in Redis?

Objective:
The objective to be able to store list of json in Redis and to be able to manipulate it right in Redis itself.
Problem Statement
The problem is that manipulating a list of JSON has required us to bring all the data in the application before being able to do anything reasonable with it. For example, filtering list of JSON where JSON strings have a certain value against a certain JSON path and things like that. This has slowed the application capability and defied the purpose of having Redis.
Possible Solutions That We Are Considering
We were thinking of using RedisJSON to support JSON natively in Redis but it doesn't seems to be meeting our requirement. Please correct me if I am wrong. [I am very much new to Redis.]
The other solution that we are considering is Lua scripting on the Redis server itself or using a library like, scala-redis to execute Lua scripts on Redis.
Please suggest us on:
If there is a better way to represent such a data in Redis?
If RedisJSON or similar modules could solve our problem in their capacity?
If we are right about using library like scala-redis?
Please let me know if this is the right place for this question or this question needs to be expressed in a better way.

PPDB paraphases searching

There is a well known lexical resources of paraphrases PPDB.
It comes with several forms from the biggest precision to the biggest recall. The biggest set XXXL for paraphrases contains ~5Gb of data.
I want PPDB for my research and I wounder what is the best engine to perform searching in such a big resources. I didn't try but I think to use it as is in file is not a good idea.
I was thinking about to export all the data to mongo, but I am not sure if this the best solution.
Please if you have some ideas share them with us.
Thank you.
You need to consider the following aspects:
1. For your use-case you will need a schemaless database
2. Transactions not required
3. Fast queries/searching
4. Easy to setup and deploy
5. Ability to handle large volumes of data
All the above aspects indicate to adopt MongoDB.
But you will have teething troubles to export data to MongoDB, but it is definitely worth the effort. Your data model can be as follows {key:[value1,value2,.....]} for each document.

MongoDB for personal non-distributed work

This might be answered here (or elsewhere) before but I keep getting mixed/no views on the internet.
I have never used anything else except SQL like databases and then I came across NoSQL DBs (mongoDB, specifically). I tried my hands on it. I was doing it just for fun, but everywhere the talk is that it is really great when you are using it across distributed servers. So I wonder, if it is any helpful(in a non-trivial way) for doing small projects and things mainly only on a personal computer? Are there some real advantages when there is just one server.
Although it would be cool to use MapReduce (and talk about it to peers :d) won't it be an overkill when used for small projects run on single servers? Or are there other advantages of this? I need some clear thought. Sorry if I sounded naive here.
Optional: Some examples where/how you have used would be great.
Thanks.
IMHO, MongoDB is perfectly valid for use for single server/small projects and it's not a pre-requisite that you should only use it for "big data" or multi server projects.
If MongoDB solves a particular requirement, it doesn't matter on the scale of the project so don't let that aspect sway you. Using MapReduce may be a bit overkill/not the best approach if you truly have low volume data and just want to do some basic aggregations - these could be done using the group operator (which currently has some limitations with regard to how much data it can return).
So I guess what I'm saying in general is, use the right tool for the job. There's nothing wrong with using MongoDB on small projects/single PC. If a RDBMS like SQL Server provides a better fit for your project then use that. If a NoSQL technology like MongoDB fits, then use that.
+1 on AdaTheDev - but there are 3 more things to note here:
Durability: From version 1.8 onwards, MongoDB has single server durability when started with --journal, so now it's more applicable to single-server scenarios
Choosing a NoSQL DB over say an RDBMS shouldn't be decided upon the single or multi server setting, but based on the modelling of the database. See for example 1 and 2 - it's easy to store comment-like structures in MongoDB.
MapReduce: again, it depends on the data modelling and the operation/calculation that needs to occur. Depending on the way you model your data you may or may not need to use MapReduce.

MapReduce implementation in Scala

I'd like to find out good and robust MapReduce framework, to be utilized from Scala.
To add to the answer on Hadoop: there are at least two Scala wrappers that make working with Hadoop more palatable.
Scala Map Reduce (SMR): http://scala-blogs.org/2008/09/scalable-language-and-scalable.html
SHadoop: http://jonhnny-weslley.blogspot.com/2008/05/shadoop.html
UPD 5 oct. 11
There is also Scoobi framework, that has awesome expressiveness.
http://hadoop.apache.org/ is language agnostic.
Personally, I've become a big fan of Spark
http://spark-project.org/
You have the ability to do in-memory cluster computing, significantly reducing the overhead you would experience from disk-intensive mapreduce operations.
You may be interested in scouchdb, a Scala interface to using CouchDB.
Another idea is to use GridGain. ScalaDudes have an example of using GridGain with Scala. And here is another example.
A while back, I ran into exactly this problem and ended up writing a little infrastructure to make it easy to use Hadoop from Scala. I used it on my own for a while, but I finally got around to putting it on the web. It's named (very originally) ScalaHadoop.
For a scala API on top of hadoop check out Scoobi, it is still in heavy development but shows a lot of promise. There is also some effort to implement distributed collections on top of hadoop in the Scala incubator, but that effort is not usable yet.
There is also a new scala wrapper for cascading from Twitter, called Scalding.
After looking very briefly over the documentation for Scalding it seems
that while it makes the integration with cascading smoother it still does
not solve what I see as the main problem with cascading: type safety.
Every operation in cascading operates on cascading's tuples (basically a
list of field values with or without a separate schema), which means that
type errors, I.e. Joining a key as a String and key as a Long leads
to run-time failures.
to further jshen's point:
hadoop streaming simply uses sockets. using unix streams, your code (any language) simply has to be able to read from stdin and output tab delimited streams. implement a mapper and if needed, a reducer (and if relevant, configure that as the combiner).
I've added MapReduce implementation using Hadoop on Github with few test cases here: https://github.com/sauravsahu02/MapReduceUsingScala.
Hope that helps. Note that the application is already tested.