Need your advice to choose between Pentaho and Spring batch? - spring-batch

I'm developing a java web application with Spring framework. And the main idea is to develop an ETL (for Extract , Transform,Load ) which extracts data from different databases and then load them in the same database.
I did a research , I found that Pentaho has a java API which let me do that work also it's a BI solution ,But documentation is too poor.
I'm used to using Spring batch and Here I'm not able to know the difference between them. Because Spring batch also let me do reading data from databases, process them and then store them.
Is the spring batch solution an ETL ??
Is pentaho more powerful or what shall I choose ?
Thanks in advance :)

Related

Does the Spring Batch team offer a built-in way to read Excel files?

I need to pass around Excel files in a Spring batch job. The only examples I've seen use a repository that my build server does not have access to - I need something from the Spring Project itself. Is there anything like that? Or is my best course of action a home-grown solution?

Spring batch jpa and schema questions

In documentation it says "JPA doesn't have a concept similar to the Hibernate StatelessSession so we have to use other features provided by the JPA specification." - what does this mean? Hibernate is one of the jpa impl so bit confused here
Looking for example where we use jpa infra that we have (entity/crud repo) and we want to use that to read data and write data. Most examples talk about file reading and writing and some about jdbc cursor reader. But since we are using other feature of hibernate like envers we want to use same jpa way that we are using for our online transactions. We are using spring boot/jpa (hibernate) out of the box with oracle and in memory h2 db for dev.
In prod we use oracle, we have user that access to some schemas, how we can inform spring batch to use particular schema to create tables. Right now for some time same application will be use for batch and online so we dont want to use second datasource and different user for batch if possible. Isnt this very basic requirement for all?
Good documentation of spring batch and also liked java/xml config toggle.
We use springboot 2.x with batch.
In documentation it says "JPA doesn't have a concept similar to the Hibernate StatelessSession so we have to use other features provided by the JPA specification." - what does this mean?
The direct equivalent of the Hibernate Session API in JPA is the EntityManager. So this simply means there is no API like StatelessEntityManager in JPA, and we need to find a way to achieve the same functionality with JPA APIs only, which is explained in the same section: After each page is read, the entities become detached and the persistence context is cleared, to allow the entities to be garbage collected once the page is processed.
we want to use same jpa way that we are using for our online transactions.
You can use the same DAOs or repositories for both your web app and batch app. For example, the ItemWriterAdapter lets you adapt your hibernate/JPA DAO/repository to the item writer interface and use it to persist entities.
In prod we use oracle, we have user that access to some schemas, how we can inform spring batch to use particular schema to create tables. Right now for some time same application will be use for batch and online so we dont want to use second datasource and different user for batch if possible. Isnt this very basic requirement for all?
You can use the same data source for both your web app and batch app. Then it is up to you to choose the schema for Spring Batch tables. I would recommend using the same schema so that data and meta-data are always in sync (when a Spring Batch transaction fails for example).
Hope this helps.

Generating offline documentation for Spring projects

As part of familiarizing myself with the gauntlet of frameworks and functionalities provided by Spring, I often download their PDF documentation to read offline. For instance, the core Spring Framework documentation is over 900 pages.
However, I noticed that there are quite a few top-level Spring projects that don't have their equivalent offline/PDF documentation. Full list:
Spring Cloud
Spring Social
Spring LDAP
Spring Session
Spring Flo
I remembered reading somewhere that Spring uses AsciiDoctor to generate their documentation. So, my question is, what is the fastest way one can one go about converting the "htmlsingle" or "html5" online documentation to PDFs for offline viewing?
Using the normal Ctrl + P and then Save as PDF in Chrome renders the webpage for Spring Cloud as a proper PDF, in the same format/layout as the "official" PDF docs for the other Spring projects.
Thanks to Github user https://github.com/dsyer for providing the hint on the Spring Cloud Gitter.

Neo4j with Spring using remote Neo4j-Server

I want to build a RESTful Application with Spring using Neo4j as a Database.
What I (very simply) want to have, is an application that takes entities (like a user) via POST, persists them in a Neo4j Database and loads them on a GET.
I tried the spring tutorials (build an embedded graphdb and also accessing an external graphdb via rest) for that and it worked, but it seems that i can't use the neo4j standalone to view the database live, because it seems to be locked by my application.
It is important for me to have some kind of method to view the live database, so I'm stuck.
So basically I'm looking for a simple way to have an application writing to and reading from an external graphdb, which I can manipulate with the neo4j standalone (or some alternative program).
I' asking here, because at this point I don't even know what to goole anymore :)
There are two options: http://neo4j.com/developer/spring-data-neo4j
1) with Spring Data Neo4j version 4 you can work against Neo4j server
2) with Spring Data Neo4j 3, you can move your code into a server extension
3) there is also an option to start Neo4j server with an embedded database (this should only be done during development not production)
Thanks to your answers, I have some examples that show me the intended use of Neo4j which helps me a lot.
Basically github.com/neo4j-examples?utf8=%E2%9C%93&query=sdn4 was the answer to all my questions, since it provides me with a useful "look how it's done".
Thank you.

Spring Data : Embedded /Non embedded?

I'm using Spring Data for Neo4j and MongoDB, I find it awesome, but now I just found out about the embedded and not embedded DB stuff.
Here's my situation :
Using Spring Data with the annotations, repositories, templates and thinking that I just need to change the DB address to make it work elsewhere.
My questions :
1) I don't even understand what they mean by embedded vs non embedded (on the same machine vs on a distant machine ?)
2) Do I have to change all the work I've done to make it work with a 'non embedded' DB ?
What I wan't to do is to deploy my Spring Boot app that is using Neo4j to Heroku or CloudFoundry and use Graphen (Neo4j paas) for the DB. But when I saw all this story about Spring Data working only for embedded, I just lost all the hope and happiness I had when building my app.
3) If 2) is Yes, is it an easy transition ? is there a lot of things to change ?
EDIT :
Here's what I'm talking about :
http://inserpio.wordpress.com/2014/04/30/extending-the-neo4j-server-with-spring-data-neo4j/
He's adding some custom boilerplate code to make it work with a non embeded DB, is it ok ? Why it doesn't work as any other DB (like with JPA, where you just specify the address of the DB).
inserpio here. Don't lose your happiness, please: Spring Data Neo4j team is working hard to implement a new release that improves remote performances.
When Spring Data Neo4j started neither Cypher nor Neo4j-Server existed, while only the embedded version was available. As the server version was delivered SDN team provided a quick solution that works well if you only use repositories, but becomes a little bit too chatty is you want to use #Entity too. The problem is matching those #Entity with the returned nodes.
Since the new version is still not completed, for the moment, you could move you persistence-logic more tight to the database as a server extension. I explained it on the link you mentioned. It's a really fast refactoring: just move your entities and repositories to a new simple java project, install the resulting jar in the 'plugins' folder, one line configuration in the neo4j-server.properties and expose your queries as simple REST services.
Hope this could help.
Do not hesitate to contact me for any further question.
Cheers,
Lorenzo