Stored procedure in Neo4j - nosql

I wanted to know if there is any Neo4j equivalent of a stored procedure?
When I researched this, I came across events, but I found them more like triggers and not stored procedures.

There are basically two techniques to extend a Neo4j server:
Server plugins enrich the existing REST endpoints and
unmanaged extensions allow to you create new REST endpoints
Both techniques require to write code in JVM (or other JVM language), package a jar file and deploy it to Neo4j server.

Stored procedures are available as capabilities CALLABLE from the Cypher language since version 3.0
A first reference can be found here
https://dzone.com/articles/neo4j-30-stored-procedures
A remarkable example, showing how graph can be processed in the large
through procedure to achieve network clustering and community
decetion, here
http://www.markhneedham.com/blog/2016/02/28/neo4j-a-procedure-for-the-slm-clustering-algorithm/
EDIT
As Neo4J 3.0 has been released in April'16, the stored procedure became an official, Apache 2.0 licensed, repository.
https://neo4j.com/labs/apoc/
Available procedures range from data manipulation/import to Spatial and complex graph algorithms (es. Page Rank, Dijkstra, Community detection, betweenness centrality , closeness centrality, etc)

My answer here does not answer the question directly (Stefan's answer does just fine for that). With that said, if any of you are considering writing server plugins (to get Stored Proc behavior) before your project is actually being used in production (which at the time of this writing is the vast majority of the Neo4j userbase), I strongly recommend not doing so.
Server plugins add architectural complexity to your project. You will require JVM developers to maintain them. Deploying or updating them can be tricky, and the associated source control methodologies are not intuitive. Neo4j doesn't require schema migrations, which makes your job as a developer easier. Adding server plugins will no longer give you that benefit, and since it's not a mainstream use case of Neo4j, you'll be getting little help from the developer community, and improvements and bug fixes related around that function will be given lesser priority from the Neo4j team.
And all that would be for possibly a slight performance boost, or none at all.
"Stored Procedures" (or using server plugins as such) are an important feature to have in the context of performance tuning, but if your team is still two guys in a garage, don't even think about going down this path.

Related

What are some of the big differences in Java Client versus Go Client when implementing Uber Cadence workflow?

I am working on designing a workflow with the intention of using cadence workflow engine and Java client. Seems like uber is actively using Go, and thus Go has better documentation and Activity and other classes than Java Client. Is this true?
No, it is not really true. The majority of the open source users of Cadence and Temporal are using Java SDK.
If you go the the java-client channel in Cadence slack, the community has more discussion than go-client. Even in Uber, Java-client is heavily used by core services like payments.
Go client happens to have more docs/samples because it started a little earlier. In fact, the docs that are missing in Java, could be derived from Go. It should be noted that there are more documents in Java library. For example, the documents of how to write unit tests, instead of putting in to cadenceworkflow.io, we put in
javadocs directly. Because this is the convention for Java developer to lookup documentation.
IMO they are the same important for Cadence. All the new features are implemented/rolling out at the same time hence they don't have real difference.

Understanding Hadoop Packages and Classes

I have been using CDH and HDP for a while (both in the pseudo-distributed mode) on a VM as well as installing natively on Ubuntu. Although my question is probably relevant to all Projects within the Apache Hadoop Ecosystem, let me ask this specifically in the context of Avro.
What is the best way to go about figuring out what the different packages and the classes within the packages do. I usually end up referring to the Javadoc for the project (Avro in this case) but the overviews for packages and classes end up being awfully inadequate.
For e.g. Take two of the Avro packages: org.apache.avro.specific and org.apache.avro.generic These are used for creating Specific and Generic Readers and Writers (respectively) but I'm not a 100% sure what these are for. I have used the Specific Package for in cases when I have used Avro Code Generation and the Generic ones when I don't want to use code generation. However, I am not sure if that is the only reason for using one vs. the other.
Another example: The Encoder\Decoder Classes are used for low-level SerDe, the DatumReader\DatumWrite for a "medium-level" Serde while most application layer interactions with Avro will probably use Generic\Specific Readers\Writers. Without having struggled through the pain of using these classes, how is a user to know what to use for what?
Is there a better way to get a good overview of each package (clearly the javadoc is not well documented) and the classes within the package?
PS: I have similar questions for essentially all other Hadoop Projects (Hive, HBASE etc.) - the Javadocs seem to be grossly inadequate overall. I just wonder what other developers end up doing to figure these out.
Any inputs would be great.
I download the source code and skim through it to get the idea what it does. If there is javadoc, I read that too. I tend to concentrate on the interfaces that I need and move on from there, that way I put everything into context and it makes it easier to figure out the usage. I use the call hierarchy and the type hierarchy views a lot.
These are very general guidelines, and ultimately it is the time you spend with the project that will make you understand it.
Hadoop ecosystem is quickly growing and changes are introduced on monthly bases. that's why javadoc is not so good. Another reason is that hadoop software tends to lean towards the infrastructure and not towards the end user. People developing tools will spend time learning the APIs and internals while everybody else is kinda supposed to be blissfully ignorant of all those, and just use some high level domain specific language for the tool.

Why mongoDB is used in the MEAN stack

I'm fixing the usability/documentation for the mean stack. I'm starting with Mean.JS. Can someone give me the salient reasons why the authors of the MEAN stack use MongoDB as the database? There are other databases to choose, but MongoDB is used for some reason.
I realize there are questions already covering databases, but I'm wondering specifically why it was used in the MEAN stack scenario.
It think the primary reason is that MongoDB uses the same language Javascript (ECMA Script) for methods and functions API, rather than a separate language (like SQL). Thus MongoDB is a good no SQL database option, and it works much more efficiently as a database for the rest of the stack.
As others have pointed out, there are many other reasons, like that it is the most popular NoSQL database at this point. It has a decent shell and you can write Javascript in it. It is Open Source and well documented.
It is also really easy to setup, and scales fairly well, although not as good as some other NoSQL databases.
It also uses BSON, which is similar to JSON, which is similar to a Javascript object. So it is just plain easy to learn and easy to use this particular database with the rest of a Javascript stack.
There's some pretty good reasons here: http://blog.mongodb.org/post/49262866911/the-mean-stack-mongodb-expressjs-angularjs-and
A Glimpse Into Four Key Components - How MEAN Stack Adds New Dimensions To New-Age Web Applications
All four components of MEAN Stack are popular in the app development space. It offers a platform that enables an effortless development work process. Let’s know about every component and its unique features.
MongoDB – Independent database framework
For any web app building, data storage and management are essential. MongoDB is a popular database with NoSQL document to allow this purpose. The primary use case of this framework is to enable data storage and management of every web application development.(Read More)

NoSQL (e.g. MongoDB) or RDMS (e.g. PostgreSQL) for new Scala project?

I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.

Derby vs PostgreSql Performance Comparison

We are doing research right now on whether to switch our postgresql db to an embedded Derby db. Both would be using glassfish 3 for our data layer. Anybody have any opinions or knowledge that could help us decide?
Thanks!
edit: we are writing some performance tests ourselves right now. Looking for answers more based on experience / first hand knowledge
I know I'm late to post an answer here, but I want to make sure nobody makes the mistake of using Derby over any production-quality database in the future. I apologize in advance for how negative this answer is - I'm trying to capture an entire engineering team's ill feelings in a brief Q&A answer.
Our experience using Derby in many small-ish customer deployments has led us to seriously doubt how useful it is for anything but test environments. Some problems we've had:
Deadlocks caused by lock escalations - this is the biggest one and happens to one customer about once every week or two
Interrupted I/Os cause Derby to fail outright on Solaris (may not be an issue on other platforms) - we had to build a shim to protect it from these failures
Can't handle complicated queries which MySQL/PostgreSQL would handle with ease
Buggy transaction log implementation caused a table corruption which required us to export the database and then re-import it (couldn't just drop the corrupted table), and we still lost the table in the process - thank goodness we had a backup
No LIMIT syntax
Low performance for complicated queries
Low performance for large datasets
Due to the fact that it's embedded, Derby is more of a competitor to SQLite than it is to PostgreSQL, which is an extremely mature production-quality database which is used to store multi-petabyte datasets by some of the largest websites in the world. If you want to be ready for growth and don't want to get caught debugging someone else's database code, I would recommend not using Derby. I don't have any experience with SQLite, but I can't imagine it being much less reliable than Derby has been for us and still being as popular as it is.
In fact, we're in the process of porting to PostgreSQL now.
Derby still is relatively slow in performance, but ... where ever your Java application goes your database server goes, completely platform independent. You don't even need to think about installing a DB server where your Java app is being copied to.
I was using MySQL with Java, but having an embedded implementation of your Database server sitting right within my Java App is just stunning and unprecedented productivity, freedom and flexibility.
Always having a DB server included whenever and wherever on any platform for me is just heaven !!!
Have not compared Postgresql to Derby directly. However, having used both in different circumstances, I have found Derby to be highly reliable. However you will need to pay attention to Derby configuration to ensure it suits your application needs.
When looking at the H2 databases stats site, it's worth reading follow up discussion which comes out in favour of Derby compared to the H2 conclusions. http://groups.google.com/group/h2-database/browse_thread/thread/55a7558563248148?pli=1
Some stats from the H2 database site here:
http://www.h2database.com/html/performance.html
There are a number of performance test suites that are included as part of the Derby source code distribution itself; they are used by Derby developers to conduct their own performance testing of Derby. So if you need examples of performance tests, or want additional ones, you could consider using those. Look in the subdirectory named java/testing/org/apache/derbyTesting/perf in the Derby source distribution.
I'm not sure what you mean that Derby is embedded. Certainly that is an option, but you can also install it as a separate server.