MongoDB embedded in java - mongodb

What I got from the documentation is, It runs as a separate process on other machine and I can communicate with it using the mongo db client driver for java and I can do normal operations.
But I doubt If I can use MongoDB as an embedded db in my java application? I mean, not as a separate process on other machine or not as a separate process on the same machine. It should be part of java application.
Can you please help me out?

No, that's not possible. MongoDB is a native C++ application that uses memory-mapped files, opens sockets, etc. It won't run in a JVM.
Also, MongoDB was made for web scale applications, big data, failover clusters (replica sets) and auto-sharding, none of which really make sense in an embedded application. Also, it's quite aggressive in terms of memory usage which is undesirable for embedded applications.
--EDIT after zero323's comment--
You might want to take a look at db4o an object database for java that was made for embedding.
Also, when embedding databases, the licenses can bite you and force you to release your code under the same license, in case of MongoDB the AGPL.

Related

Polyglot persistence for startup app

I'm in the process of developing my next app, and I'm really interested in using polyglot persistence. I like the idea of being able to query different data structures for different services. I'm essentially wanting to sync MongoDB, Neo4j/Titan, SQL, and maybe Cassandra/Hbase.
Currently, I'm wrapping everything in a try/catch block and rolling them all back if one fails. However, this is taxing my write times. I've also looked into AMQP systems like Kafka or ZeroMQ, but these seem more big data centric, whereas my app is still small and I want to keep it efficient.
Has anyone had experience with this? Is a MQ a good idea for a small app or am I prematurely optimizing?
Thanks
I know quite a bit about ZeroMQ, but not a lot about the database servers you mention.
First, you're a bit confused about ZeroMQ. Although it is derived from experience with AMQP, it uses the ZMTP wire protocol. That was custom designed during ZeroMQ development [but other applications do now use it].
ZeroMQ is a small and very fast MQ library that is symmetrical for all nodes; it is very good for small apps. The problem here is that you need something on the other systems that talks ZMTP, whether it's ZeroMQ or a bridge. If you intend making plugins or the like for the other systems then fine.
I presume though, that you are using JMS to talk to the other systems without intending to develop add-ons for them. In which case you're probably stuck with JMS. Kafka is a new one that I haven't caught up with, but RabbitMQ is a good, fast, and small, broker. FWIW. There are a great many broker comparisons out there for you to find. Many are dodgy in the sense that one small tweak of a setting can affect the performance greatly and are not necessarily comparing apples with apples. If want to compare broker performance in your environment, there isn't much of a shortcut to doing it yourself.
One thing that is confusing me is how you expect a broker to help your rollback performance. You'll still need to do the rollback in essentially the same manner, albeit asyncronously via the broker.
I work at CloudBoost.io (https://www.cloudboost.io) and we build a layer that sits on top of databases and give you the power of Polyglot Persistence persistence. We integrate MongoDB, ElasticSearch, Redis, Cassandra, and Neo4j and give you the one single APIwhere you can query / store your data. We automatically shard your data into various databases based on your query / storage request patterns.
Let me know if this helps. :)
For syncing data from MongoDB to Neo4j there is now the Neo4j Doc Manager project. It works by monitoring MongoDB for operations and converts the document operation into a property graph model and immediately writes to Neo4j.

Why mongoDB is used in the MEAN stack

I'm fixing the usability/documentation for the mean stack. I'm starting with Mean.JS. Can someone give me the salient reasons why the authors of the MEAN stack use MongoDB as the database? There are other databases to choose, but MongoDB is used for some reason.
I realize there are questions already covering databases, but I'm wondering specifically why it was used in the MEAN stack scenario.
It think the primary reason is that MongoDB uses the same language Javascript (ECMA Script) for methods and functions API, rather than a separate language (like SQL). Thus MongoDB is a good no SQL database option, and it works much more efficiently as a database for the rest of the stack.
As others have pointed out, there are many other reasons, like that it is the most popular NoSQL database at this point. It has a decent shell and you can write Javascript in it. It is Open Source and well documented.
It is also really easy to setup, and scales fairly well, although not as good as some other NoSQL databases.
It also uses BSON, which is similar to JSON, which is similar to a Javascript object. So it is just plain easy to learn and easy to use this particular database with the rest of a Javascript stack.
There's some pretty good reasons here: http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and
A Glimpse Into Four Key Components - How MEAN Stack Adds New Dimensions To New-Age Web Applications
All four components of MEAN Stack are popular in the app development space. It offers a platform that enables an effortless development work process. Let’s know about every component and its unique features.
MongoDB – Independent database framework
For any web app building, data storage and management are essential. MongoDB is a popular database with NoSQL document to allow this purpose. The primary use case of this framework is to enable data storage and management of every web application development.(Read More)

Testing: Best embedded database to use

I've been using Derby (embedded) for testing my applications. It was working fine, but recently it started giving me problems like this one Derby Sequence Loop in junit test.
The same test runs fine with other databases (e.g. H2).
My tech stack is Java 1.6, Spring, Hibernate.
I've seen comparisons of similar databases (H2, HSQLDB, Derby, PostgreSQL and MySQL), but the comparisons don't mentions the problems (like the one I faced).
So, which one is the prefered one for testing purposes?
The best test database is the same version of the same type you're running in production, on the same host OS and architecture.
The question really tends to be "what requirements do you have in testing but not production that make you consider a different DB"?
For example, you might want an embedded database you can easily launch and destroy under unit testing control, without having to specify paths to external binaries, launch and stop a database server, etc. In exchange for that you accept a compromise that your tests are not as good a match for the real system you'll be running on.
When I was building software on Java app servers I ran my tests against PostgreSQL, just like the production machine. It's not like it's hard to initdb and pg_ctl start a DB; you can even do it from a test wrapper if you really feel the need. I think people are a bit too focused on fully in-process embedded DBs and don't properly consider the costs of that.
I understand some people think that it's better to always use the same database. But there are a few disadvantages:
Unit tests might be very slow as each database call goes over TCP/IP.
Running the tests might be more complicated as you need to install and run the same database as you use for production on your local machine.
The database you use for production database might need a lot of memory, which you might not want on the developer machine.
There is a risk you inadvertently use features that are only available on one database, which will make switching to another database very hard (if you ever consider doing that in the future).
Using a different database for unit testing also has disadvantages of course. If you use database specific features, there is a risk that some tests can not be run or the tests don't behave exactly the same way. Because of that, I suggest to run the nightly builds using the same database as used for production.
Specially if you are using a database abstraction layer such as Hibernate, switching the database should be very easy. In this case it does make a lot of sense to consider an in-memory database such as H2, HSQLDB, or Derby. Specially if you consider writing many (or longer running) tests.

Best suited NoSQL database for Content Recommender

I am currently working in a project which includes migrating a content recommender from MySQL to a NoSQL database for performarce reasons. Our team has been evaluating some alternatives like MongoDB, CouchDB, HBase and Cassandra. The idea is to choose a database that is capable of running in a single server or in a cluster.
So far we have discarded the use of Hbase due to its dependency on a distributed environment. Even having the idea of scaling horizontally, we need to run the DB in a single server for a little while in production. MongoDB was also discarded because it does not support map/reduce features.
We have still 2 alternatives and we have no solid background to decide. Any guidance or help is appreciated
NOTE: I do not pretend to create a religion-like discussion with non-founded arguments. It is a strictly technical question to be discussed in the problem's context
Graph databases are usually considered as best suited for recommendation engines, since a lot of the recommendation algorithms are actually graph based. I recommend looking into Neo4J - it can handle billions of nodes/edges on a single machine and it supports a so-called high availability mode which is a master-slave setup with automatic master selection.

Where have you used gSOAP?

Can you give examples how you used gSOAP and how well it was integrated in your existing architecture? Have you found development bottlenecks with gSOAP?
We used gSOAP for a bunch of ARM clients to communicate with an AXIS Web Service server. Pros of gSOAP:
very powerful, supports nearly all Web Service constructs
easy to use, its abstraction of WS calls into functions removes all Web Service complexity to the programmer
elegant interfaces in both C and C++
However, we ran into several development bottlenecks:
when using custom datatypes like maps or sets, it takes quite some hacking to get the gSOAP compiler to handle them (marshal/unmarshalling). Especially bad with dynamic data structures.
debugging is hard because of its intrinsic complex network, parsing and memory allocation parts. Do everything possible to stick with static memory allocation.
the mailing list is alive, but developers are not very active on it. Simple questions can get answered quickly, but the toughest problems often go unanswered
forget about optimization. Linking in gSOAP eats about 1MB of memory at runtime (-Os). Runtime performance is fine on our 32MB linux-based ARM board, but there really is little to do about optimization if you need it.
We used gSOAP in a C++-based web server about 4 years back. Overall it worked fine. The only major issue was that the interface was in C and procedural (I understand it is difficult to design a good non-procedural interface). There may be a lot of repeated code while implementing the interface, for which you might have to use macros (we didn't explore the templates option too far then).
We are using gSoap to deploy a web service onto an embedde linux device running an ARM MX processor.
We are using gSOAP to consume a WCF based webservice from an application deployed on a linux device running on ARM processor. The experience is good to a large extent.
We used gSOAP in a web server on ARM ARM9 400MHz device.
gSOAP daemon connected to a database daemon through zeromq library, which is run on the same device.
It supports more than 1000 basic requests wich does not requre connection to database.
Disabling support for multi-referenced SOAP option by the WITH_NOIDREF define helped to decrease serialization time about 4 times faster on big requests with large number of serialization nodes.