Has anybody any experience with HBase usage with Datanucleus via JPA. I'm struggling to get Datanucleus to fetch collections for me from the Datastore. Any pointers would be much appreciated!
Dependencies:
https://github.com/ravindranathakila/www.ilikeplaces.com/blob/master/pom.xml
persistance.xml:
https://github.com/ravindranathakila/www.ilikeplaces.com/blob/master/src/main/resources/META-INF/persistence.xml
orm.xml:(with or without didn't make a difference though)
https://github.com/ravindranathakila/www.ilikeplaces.com/blob/master/src/main/resources/META-INF/orm.xml
The code works fine with OpenJPA and Apache Derby database, so I should say with respect to JPA, there are no big flaws.
Once again, some help or failure cases would be much appreciated!
The best approach in case you are using maven to build on datanucleus is to download the bundled libraries and give system paths to them(go for the HBase full bundle, and a release version). This will save a lot of your time resolving conflicts and missing jars.
Next comes enhancing the classes. This can be done as mentioned here. Try different approaches mentioned there if you keep hitting dependency issues (theoretically, you shouldn't be since you are using the downloaded libraries, unless there are version conflicts with different versions of the same file or HBase-Zookeeper versions)
While this does not solve the above problem, I'm endorsing the question with this answer to help others jump ahead in narrowing down issues pertaining to this.
Related
I have been using CDH and HDP for a while (both in the pseudo-distributed mode) on a VM as well as installing natively on Ubuntu. Although my question is probably relevant to all Projects within the Apache Hadoop Ecosystem, let me ask this specifically in the context of Avro.
What is the best way to go about figuring out what the different packages and the classes within the packages do. I usually end up referring to the Javadoc for the project (Avro in this case) but the overviews for packages and classes end up being awfully inadequate.
For e.g. Take two of the Avro packages: org.apache.avro.specific and org.apache.avro.generic These are used for creating Specific and Generic Readers and Writers (respectively) but I'm not a 100% sure what these are for. I have used the Specific Package for in cases when I have used Avro Code Generation and the Generic ones when I don't want to use code generation. However, I am not sure if that is the only reason for using one vs. the other.
Another example: The Encoder\Decoder Classes are used for low-level SerDe, the DatumReader\DatumWrite for a "medium-level" Serde while most application layer interactions with Avro will probably use Generic\Specific Readers\Writers. Without having struggled through the pain of using these classes, how is a user to know what to use for what?
Is there a better way to get a good overview of each package (clearly the javadoc is not well documented) and the classes within the package?
PS: I have similar questions for essentially all other Hadoop Projects (Hive, HBASE etc.) - the Javadocs seem to be grossly inadequate overall. I just wonder what other developers end up doing to figure these out.
Any inputs would be great.
I download the source code and skim through it to get the idea what it does. If there is javadoc, I read that too. I tend to concentrate on the interfaces that I need and move on from there, that way I put everything into context and it makes it easier to figure out the usage. I use the call hierarchy and the type hierarchy views a lot.
These are very general guidelines, and ultimately it is the time you spend with the project that will make you understand it.
Hadoop ecosystem is quickly growing and changes are introduced on monthly bases. that's why javadoc is not so good. Another reason is that hadoop software tends to lean towards the infrastructure and not towards the end user. People developing tools will spend time learning the APIs and internals while everybody else is kinda supposed to be blissfully ignorant of all those, and just use some high level domain specific language for the tool.
I wanted to know if there is any Neo4j equivalent of a stored procedure?
When I researched this, I came across events, but I found them more like triggers and not stored procedures.
There are basically two techniques to extend a Neo4j server:
Server plugins enrich the existing REST endpoints and
unmanaged extensions allow to you create new REST endpoints
Both techniques require to write code in JVM (or other JVM language), package a jar file and deploy it to Neo4j server.
Stored procedures are available as capabilities CALLABLE from the Cypher language since version 3.0
A first reference can be found here
https://dzone.com/articles/neo4j-30-stored-procedures
A remarkable example, showing how graph can be processed in the large
through procedure to achieve network clustering and community
decetion, here
http://www.markhneedham.com/blog/2016/02/28/neo4j-a-procedure-for-the-slm-clustering-algorithm/
EDIT
As Neo4J 3.0 has been released in April'16, the stored procedure became an official, Apache 2.0 licensed, repository.
https://neo4j.com/labs/apoc/
Available procedures range from data manipulation/import to Spatial and complex graph algorithms (es. Page Rank, Dijkstra, Community detection, betweenness centrality , closeness centrality, etc)
My answer here does not answer the question directly (Stefan's answer does just fine for that). With that said, if any of you are considering writing server plugins (to get Stored Proc behavior) before your project is actually being used in production (which at the time of this writing is the vast majority of the Neo4j userbase), I strongly recommend not doing so.
Server plugins add architectural complexity to your project. You will require JVM developers to maintain them. Deploying or updating them can be tricky, and the associated source control methodologies are not intuitive. Neo4j doesn't require schema migrations, which makes your job as a developer easier. Adding server plugins will no longer give you that benefit, and since it's not a mainstream use case of Neo4j, you'll be getting little help from the developer community, and improvements and bug fixes related around that function will be given lesser priority from the Neo4j team.
And all that would be for possibly a slight performance boost, or none at all.
"Stored Procedures" (or using server plugins as such) are an important feature to have in the context of performance tuning, but if your team is still two guys in a garage, don't even think about going down this path.
I followed the path of Java EE for quite a while now, used JBoss Seam and followed its standardization within CDI.
Now after trying to solve the first steps on the hard path from Seam 2 to 3 migration, I learned that all was moved to Apache Deltaspike.
But while Seam was decently documented and equipped with examples, Deltaspike is not. There are menu items leading to "Documentation", which is very poor, littered with TODOs and without any visible structure and to "Examples", which is more or less a joke.
Ever since I tried to step up to Java EE 6, I feel a bit like standing in the rain - even though it's great, that many concepts of Seam 2 went into the standard, I miss many things I had before - and exactly those things should be covered by CDI extensions. Here again, it's great that there is a common effort to channel those extensions in a project like Apache Deltaspike - but at the moment there is a very high hurdle to get benefits from it, even if you're not a beginner in the technology.
So - can anyone lead me to decent resources, documentation and examples how to use and understand the CDI extensions?
As noted in other responses, the DeltaSpike documentation is a little lacking. You could always take a look at the tests and javadoc. For examples, I think you'll find JBoss jdf quickstarts to be the best location for examples currently. Do a search on the right for DeltaSpike and you should see around seven examples.
DeltaSpike is still early in its development. It is only at version 0.3; it might be quite some time before it's production-ready. Until then, you might take a look at MyFaces CODI or Seam, two projects whose development has halted as they are presently being merged into DeltaSpike.
The documentation at the DeltaSpike web site, I agree, is quite insufficient for users; I bet it's just for people that want to test it out or develop for it.
Here's a concrete example. I am a reasonably experienced software person. I would like to use DeltaSpike BeanProvider.getContextReference to inject an EJB into some code that is not itself an EJB. I included the requisite Maven dependencies and added the BeanProvider.getContextReference code to one of my classes. I'm getting error messages that tell me that DeltaSpike is not configured. Two hours with the documentation did not get me any closer to understanding what I need to do simply to turn it on. What seems to be missing is a "How to configure DeltaSpike core." page.
I have just found out about the existence of graph databases and neo4j and was wandering if there was a way to use it within NetBeans somehow in the way Apache Derby is used. I'm sorry if this is a very stupid question, I've just begun learning about databases and am far from knowing my way around. I can most definitely look for a way to do this on my own, but if a more experienced person can share a tip about this, I'd be very grateful.
You can use Neo4j either as part of your java application or as standalone server, see:
http://docs.neo4j.org/chunked/snapshot/tutorials.html
We are doing research right now on whether to switch our postgresql db to an embedded Derby db. Both would be using glassfish 3 for our data layer. Anybody have any opinions or knowledge that could help us decide?
Thanks!
edit: we are writing some performance tests ourselves right now. Looking for answers more based on experience / first hand knowledge
I know I'm late to post an answer here, but I want to make sure nobody makes the mistake of using Derby over any production-quality database in the future. I apologize in advance for how negative this answer is - I'm trying to capture an entire engineering team's ill feelings in a brief Q&A answer.
Our experience using Derby in many small-ish customer deployments has led us to seriously doubt how useful it is for anything but test environments. Some problems we've had:
Deadlocks caused by lock escalations - this is the biggest one and happens to one customer about once every week or two
Interrupted I/Os cause Derby to fail outright on Solaris (may not be an issue on other platforms) - we had to build a shim to protect it from these failures
Can't handle complicated queries which MySQL/PostgreSQL would handle with ease
Buggy transaction log implementation caused a table corruption which required us to export the database and then re-import it (couldn't just drop the corrupted table), and we still lost the table in the process - thank goodness we had a backup
No LIMIT syntax
Low performance for complicated queries
Low performance for large datasets
Due to the fact that it's embedded, Derby is more of a competitor to SQLite than it is to PostgreSQL, which is an extremely mature production-quality database which is used to store multi-petabyte datasets by some of the largest websites in the world. If you want to be ready for growth and don't want to get caught debugging someone else's database code, I would recommend not using Derby. I don't have any experience with SQLite, but I can't imagine it being much less reliable than Derby has been for us and still being as popular as it is.
In fact, we're in the process of porting to PostgreSQL now.
Derby still is relatively slow in performance, but ... where ever your Java application goes your database server goes, completely platform independent. You don't even need to think about installing a DB server where your Java app is being copied to.
I was using MySQL with Java, but having an embedded implementation of your Database server sitting right within my Java App is just stunning and unprecedented productivity, freedom and flexibility.
Always having a DB server included whenever and wherever on any platform for me is just heaven !!!
Have not compared Postgresql to Derby directly. However, having used both in different circumstances, I have found Derby to be highly reliable. However you will need to pay attention to Derby configuration to ensure it suits your application needs.
When looking at the H2 databases stats site, it's worth reading follow up discussion which comes out in favour of Derby compared to the H2 conclusions. http://groups.google.com/group/h2-database/browse_thread/thread/55a7558563248148?pli=1
Some stats from the H2 database site here:
http://www.h2database.com/html/performance.html
There are a number of performance test suites that are included as part of the Derby source code distribution itself; they are used by Derby developers to conduct their own performance testing of Derby. So if you need examples of performance tests, or want additional ones, you could consider using those. Look in the subdirectory named java/testing/org/apache/derbyTesting/perf in the Derby source distribution.
I'm not sure what you mean that Derby is embedded. Certainly that is an option, but you can also install it as a separate server.