We are doing research right now on whether to switch our postgresql db to an embedded Derby db. Both would be using glassfish 3 for our data layer. Anybody have any opinions or knowledge that could help us decide?
Thanks!
edit: we are writing some performance tests ourselves right now. Looking for answers more based on experience / first hand knowledge
I know I'm late to post an answer here, but I want to make sure nobody makes the mistake of using Derby over any production-quality database in the future. I apologize in advance for how negative this answer is - I'm trying to capture an entire engineering team's ill feelings in a brief Q&A answer.
Our experience using Derby in many small-ish customer deployments has led us to seriously doubt how useful it is for anything but test environments. Some problems we've had:
Deadlocks caused by lock escalations - this is the biggest one and happens to one customer about once every week or two
Interrupted I/Os cause Derby to fail outright on Solaris (may not be an issue on other platforms) - we had to build a shim to protect it from these failures
Can't handle complicated queries which MySQL/PostgreSQL would handle with ease
Buggy transaction log implementation caused a table corruption which required us to export the database and then re-import it (couldn't just drop the corrupted table), and we still lost the table in the process - thank goodness we had a backup
No LIMIT syntax
Low performance for complicated queries
Low performance for large datasets
Due to the fact that it's embedded, Derby is more of a competitor to SQLite than it is to PostgreSQL, which is an extremely mature production-quality database which is used to store multi-petabyte datasets by some of the largest websites in the world. If you want to be ready for growth and don't want to get caught debugging someone else's database code, I would recommend not using Derby. I don't have any experience with SQLite, but I can't imagine it being much less reliable than Derby has been for us and still being as popular as it is.
In fact, we're in the process of porting to PostgreSQL now.
Derby still is relatively slow in performance, but ... where ever your Java application goes your database server goes, completely platform independent. You don't even need to think about installing a DB server where your Java app is being copied to.
I was using MySQL with Java, but having an embedded implementation of your Database server sitting right within my Java App is just stunning and unprecedented productivity, freedom and flexibility.
Always having a DB server included whenever and wherever on any platform for me is just heaven !!!
Have not compared Postgresql to Derby directly. However, having used both in different circumstances, I have found Derby to be highly reliable. However you will need to pay attention to Derby configuration to ensure it suits your application needs.
When looking at the H2 databases stats site, it's worth reading follow up discussion which comes out in favour of Derby compared to the H2 conclusions. http://groups.google.com/group/h2-database/browse_thread/thread/55a7558563248148?pli=1
Some stats from the H2 database site here:
http://www.h2database.com/html/performance.html
There are a number of performance test suites that are included as part of the Derby source code distribution itself; they are used by Derby developers to conduct their own performance testing of Derby. So if you need examples of performance tests, or want additional ones, you could consider using those. Look in the subdirectory named java/testing/org/apache/derbyTesting/perf in the Derby source distribution.
I'm not sure what you mean that Derby is embedded. Certainly that is an option, but you can also install it as a separate server.
Related
I have a performance problem on a server that contains the Alfresco document manager. From one day to another, the use of the CPU is 95% fixed, with the command that Alfresco launches to execute. I have tried to change the garbage collector, change the JVM, optimize the threads, but I do not get any improvement.
The alfresco version is the Community
Has anyone had a similar problem?
This question is very hard to answer with the little information that is provided. CPU problems could be caused by a bad transformation, but it can also be caused by several other factors. Luis Colorado did a nice presentation on performance at DevCon 2019. You can check out his presentation for some tips.
A general piece of advice is to make sure you have Alfresco's Tomcat, SOLR, and the database all running on different machines. Those all have very different performance profiles, so keeping them separate allows you to tune them individually.
I'm currently planning a social-media application - especially the backend.
Basically I have all the social aspects for which I want to use SQL (PostgreSQL I guess) but I also have geolocations organized in lists (so many-to-one) which will propably make out the biggest ammount of data. I know that PostgreSQL has modules for GIS capabilities and my initial thought was to just use PostgreSQL for everything, just for the sake of simplicity and because performance of Geolocation searches should be around the same for both systems, if not even in favor of PostgreSQL. I can also use JSON Type in PostgreSQL so it basically has the most obvious advantages of MongoDB covered.
On the other hand I'm affraid of scalability as the geolocations are going to be the biggest chunk of data and the tables are propably going have heaps of rows.
So my thought now is to implement geolocations in MongoDB with its easy scalability, easy to use geolocation search and embedd e.g Comments/Likes for a geolocation directly into the document, which would make the geolocation reads/searches way easier but then again I had to combine this data with social data from SQL, e.g fetch all users that commented a geolocation and get their profile info from PostgreSQL and 'manually' combine it. Even though parts of this could be done on frontend saving me a lot of resources.
I'm not sure how good this idea performs and if I'm really doing myself a favor there.
tldr: Use PostgreSQL.
Long answer:
You are trying to pre-optimize for a problem you don't even know if you will have. You don't know how many geolocations you will have, what the usage behaviors will be of your users and you probably don't even have any users yet.
I've used MongoDB before and migrated to PostgreSQL. There are many, many features and benefits to using a 'real' database for storing highly structured data. I suggest googling around for 'PostgreSQL vs X' articles, but the overall consensus that I've found is that PGSQL is extremely mature, reliable, performant and supported.
From my personal experience using Mongo then switching to PGSQL, I will never use Mongo again unless PGSQL (or another full-fledged SQL database) is completely falling over and I've spent months fixing it. Even then I'd take a hard look at other NoSQL databases too. PGSQL has so many amazing features and powerful tools that make it a joy to use.
For the seemingly few things you think you need Mongo for, PGSQL can do, and do just as well or better. It has native JSON types with indexes, geo support, full text indexing, etc. PGSQL has been around longer and has more support (useful for debugging, performance tuning, etc).
Regardless of which technologies you are thinking of using, you can't make any sort of informed decision if you don't:
Test with large data sets
and
Know your usage patterns and data volumes
So at this point I'd pick the more matured and powerful tool and setup monitoring for it. Watch the usage and performance of PGSQL, see how it holds up. Research best practices for PGSQL. Get to know it, learn it, dive in deep. When it comes to scaling individual services, each one is somewhat unique and will not fit a simple "Should I use X or Y?" question.
Good luck!
We are planning to migrate from Prevayler (http://prevayler.org/) to db4o (http://www.db4o.com/), so we wanted to know experiences, pros and cons, and best practices to move forward. What do you think about it? Is it a good solution? Or, maybe moving forward with a NoSQL standard solution would be better? (Such as MongoDB or CouchDB). Thanks!
we use db4o as main db in our production environment (both embedded and client/server), so i am going to share some of my experiences.
Pro:
- very easy for development (you just implememt data classes)
- support both embedded/client server under the same interface, which makes it easy to unittest
- decent performance for small projects
Cons:
- db4o is no longer developed so it's quite dead project, and you wont get much of support for it
- [client/server] everytime you change model you need to redeploy server (not talking about the fact that you need to host server app yourself)
- [client/server] performance degrade with more clients connected - not possible to scale
Summary: db4o is very good as embedded db (mobile app, desktop local db), but if it comes to server application you get into troubles
Given that I did not receive so much feedback, we gave it a try. So far, it seemed to be a good option for a embedded database, that makes much easier the deployment. So, we wrote again the whole persistence layer, with their unit tests and seemed to work fine.
Then, we tried with real data, and we start to have some weird Null Pointers, and we did not know why. Then, we started to read and we found this issue: http://www.gamlor.info/wordpress/2009/09/db4o-activation-update-depth/.
We've been trying to solve for a few hours, but then we decided no to spend more time on it, and found another way. CouchDB, OrientDB or MongoDB are still on our list.
I wanted to know if there is any Neo4j equivalent of a stored procedure?
When I researched this, I came across events, but I found them more like triggers and not stored procedures.
There are basically two techniques to extend a Neo4j server:
Server plugins enrich the existing REST endpoints and
unmanaged extensions allow to you create new REST endpoints
Both techniques require to write code in JVM (or other JVM language), package a jar file and deploy it to Neo4j server.
Stored procedures are available as capabilities CALLABLE from the Cypher language since version 3.0
A first reference can be found here
https://dzone.com/articles/neo4j-30-stored-procedures
A remarkable example, showing how graph can be processed in the large
through procedure to achieve network clustering and community
decetion, here
http://www.markhneedham.com/blog/2016/02/28/neo4j-a-procedure-for-the-slm-clustering-algorithm/
EDIT
As Neo4J 3.0 has been released in April'16, the stored procedure became an official, Apache 2.0 licensed, repository.
https://neo4j.com/labs/apoc/
Available procedures range from data manipulation/import to Spatial and complex graph algorithms (es. Page Rank, Dijkstra, Community detection, betweenness centrality , closeness centrality, etc)
My answer here does not answer the question directly (Stefan's answer does just fine for that). With that said, if any of you are considering writing server plugins (to get Stored Proc behavior) before your project is actually being used in production (which at the time of this writing is the vast majority of the Neo4j userbase), I strongly recommend not doing so.
Server plugins add architectural complexity to your project. You will require JVM developers to maintain them. Deploying or updating them can be tricky, and the associated source control methodologies are not intuitive. Neo4j doesn't require schema migrations, which makes your job as a developer easier. Adding server plugins will no longer give you that benefit, and since it's not a mainstream use case of Neo4j, you'll be getting little help from the developer community, and improvements and bug fixes related around that function will be given lesser priority from the Neo4j team.
And all that would be for possibly a slight performance boost, or none at all.
"Stored Procedures" (or using server plugins as such) are an important feature to have in the context of performance tuning, but if your team is still two guys in a garage, don't even think about going down this path.
I'm developing a brand new project in Scala. It's just an application for a bunch of CRUD operations, however, because of some eccentric requirements, Play2 or Lift does not fit the bill, so I'm going to develop the application from the ground up. This means that Anorm or ScalaQuery becomes less obvious choices for database integration, and leaves me with the question: is it time to try something new?
My past technology stacks mostly included Java and PostgreSQL and I have experience with both ORM and plain SQL. Are NoSQL database management systems like MongoDB a good replacement for a typical RDBMS or are they special case application data stores? Also, how does the choice of database effect the greater Scala system design (if at all)? For example, the fact that you are using a JSON-like interface to talk to the database, and JSON between the web and a REST service, does not mean that much if everything in the middle becomes Scala objects, or does it?
I'm basically asking for someone's experience on moving from relational to object/document type databases, using Scala in particular. I know that good RDBMS integration is promised in the upcoming release of SLICK. So, if a company like TypeSafe decides to make a RDBMS integration part of the TypeSafe stack, then will I be swimming upstream by integrating to MongoDB using Casbah for example?
Apologies if this question appears a bit vague. I do hope that someone with the right insights or experience will be able to help though.
Update:
Apologies for not adding links to SLICK (it being fairly new). Here goes:
Quick overview
Project home
Update 2:
My personal first win for a technology is usually developer productivity - this translates to lightweight and simple: quick to learn, easy to maintain, no magic
I am currently in a similar situation, and since I have some experience with web development and SQL databases, I took it as an opportunity to work with MongoDB, Cashbah (and Scalatra). My experience is still very limited and the project and the amount of data I am working with is pretty small, but here are a few observations I've made.
For the few sets of data I have, performance does not seem to motivate either SQL or NoSQL. However, performance in the presence of huge amounts of data is often listed as a reason for using NoSQL, e.g., by Wikipedia
My documents (entries in the database) arise from benchmarking test suits, and mainly have a static structure, and I am optimistic that I could store them in a fixed-schema SQL database. However, a few substructures are not static, e.g., new test cases are added, new statistics are tracked, others are removed. This was my main motivation for trying a schema-free NoSQL database. Also, because I had the feeling that the document approach of MongoDB makes it much more obvious which data belongs together (i.e., to a document), in contrast to entries in a relational database, where the data would be distributed over various tables and rows, and where a full "document" would need to be reconstructed by joins.
Tools such as Lift-Json or Rogue allow you to work with regular Scala objects in a type-safe, although the data is regularly (de-)serialised as (from) JSON. However, this naturally works best if the structure of your data is mainly static, otherwise, you you are left with using strings to access your data (e.g., for expanding the results of a query using Cashbah).
If you are mainly concerned about a coherent representation of data on server and client side, languages such as Opa or Haxe might be of interest, since they compile to code that can executed on both sides. See this page for "multitarget" or "tierless" languages.
Got too long for a comment. Was just trying to relate my short experience with Scala (about 6 months now, since about when Play2 came out--it's quickly become my go to language).
I've enjoyed using Salat/Casbah with MongoDB in my last few projects; most have been in Play2, but the latest was without a webapp framework. It definitely hasn't felt like swimming upstream.
I would say that there are particular use cases for which I wouldn't use mongo, but it works nicely as a general purpose object data store, especially if you expect to query by id or index and don't need transactions (and will need minimal ad-hoc aggregation type stuff).
Expect to require a separate set of servers dedicated to mongodb (or to use a service dedicated to mongodb), but I guess that's normal for most serious database apps.
I've also used Play2/Anorm, which was surprisingly enjoyable to use for some ad-hoc query dashboard-style report pages. I started trying to go the Squeryl route, but Anorm seemed easier to use for one-off aggregation queries. Haven't looked at SLICK, but it sounds interesting.
It's really hard to say without knowing what problems you would like the app to solve.
I've personally found my productivity increased using NoSQL DBs via REST/JSON. Though bear in mind most NoSQL DBs offer REST interfaces which preclude the need for much middleware, Scala or otherwise, unless you intend to write a webapp with a UI.
If this is a learning exercise, I recommend you try multiple things out, as each NoSQL DB has something different to offer to your toolkit, and have personally found CouchDB, Riak, Neo4j, and MongoDb all with various pluses and drawbacks and good for different purposes.
Hope this helps, good luck.