How to write my own job in samza - apache-kafka

Recently I am trying to do some stream processing work on Samza framework. I have deployed the hello-samza example successfully. However, when I try to write my own job, I have no idea where to start my work.
I have read this document, but I still can't get the point. So can anyone help me:
What is my code's architecture (source code, lib code, and configuration).
Which directory will my code pushed in.
What other work I need to do to get my codes run.
Your suggestion will help me a lot, many thanks!

It is very simple to build your own Jobs. First get hello samza:
git clone https://git.apache.org/samza-hello-samza.git hello-samza
The next step is to set up the system by these command:
bin/grid bootstrap
Please make sure that all is going good by jps
The next step is to remove apache-rat-plugin from pom.xml instead of building your project inside hello-samza.
When you remome that you can add a java file Job in src folder(MyTask.java) and also a .properties file inside config directory (My.Task.properties)
This is a sample-empty Job(MyTask.java).
package com.samza;
public class MyTask implements StreamTask {
private static final SystemStream OUTPUT_STREAM = new SystemStream("kafka","topicOut");
public void process (IncomingMessageEnvelope envelope, MessageCollector collector,
TaskCoordinator coordinator) throws Exception {
// Do something useful
}
}
Don't forget to implement a .properties file.
If you hava non-error code, build with maven like:
mvn clean package
mkdir -p deploy/samza
tar -xvf ./samza-job-package/target/samza-job-package-0.10.0-dist.tar.gz -C deploy/samza
After that and your server is up (if it is not you can started by ./bin/grid start all) you can deploy your Job by deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/MyTask.properties
and consume the result by kafka client-consumer
deploy/kafka/bin/kafka-console-consumer.sh --zookeeper localhost:2181 --topic outTopic

If you follow the Hello Samza instructions, you will have a fully functioning Zookeeper, Kafka, and Yarn/Samza cluster running on your local computer. With that project there are the Wikipedia feed related tasks that you can run to test things.
However, like you, I had some trouble coming up with the proper directory structure and build settings for new tasks (without the cluster management stuff). So, I created hello-samza-base by stripping out everything unnecessary for new tasks outside of hello-samza. I included instructions in the README on building new tasks.
As far as deployment goes, that is a bit more complex. Do some reading on creating Zookeeper, Kafka, and Yarn clusters.

I created Samza jobs parting via a Maven Eclipse project. The dependencies for version 0.9.2 where loaded in the pom.xml file with this content (I had some version issues, so you may have some work there):
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>com.acio.samza</groupId>
<artifactId>samzafroga</artifactId>
<version>0.0.1</version>
<name>samzafroga</name>
<dependencies>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-api</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-core_2.10</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-log4j</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-shell</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-yarn_2.10</artifactId>
<exclusions>
<exclusion>
<artifactId>jdk.tools</artifactId>
<groupId>jdk.tools</groupId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kv_2.10</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kv-rocksdb_2.10</artifactId>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kafka_2.10</artifactId>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
</dependency>
<dependency>
<groupId>org.schwering</groupId>
<artifactId>irclib</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-jaxrs</artifactId>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-server</artifactId>
<version>${jettyVersion}</version>
</dependency>
<dependency>
<groupId>org.eclipse.jetty</groupId>
<artifactId>jetty-webapp</artifactId>
<version>${jettyVersion}</version>
</dependency>
</dependencies>
<properties>
<!-- maven specific properties -->
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<samza.version>0.8.0</samza.version>
<jettyVersion>7.6.16.v20140903</jettyVersion>
</properties>
<repositories>
<repository>
<id>apache-releases</id>
<url>https://repository.apache.org/content/groups/public</url>
</repository>
<repository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>https://oss.sonatype.org/content/groups/scala-tools</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>scala-tools.org</id>
<name>Scala-tools Maven2 Repository</name>
<url>http://scala-tools.org/repo-releases</url>
</pluginRepository>
</pluginRepositories>
<build>
<pluginManagement>
<plugins>
<plugin>
<groupId>org.apache.rat</groupId>
<artifactId>apache-rat-plugin</artifactId>
<version>0.9</version>
<configuration>
<excludes>
<exclude>*.patch</exclude>
<exclude>**/target/**</exclude>
<exclude>*.json</exclude>
<exclude>.vagrant/**</exclude>
<exclude>.git/**</exclude>
<exclude>*.md</exclude>
<exclude>docs/**</exclude>
<exclude>config/**</exclude>
<exclude>bin/**</exclude>
<exclude>.gitignore</exclude>
<exclude>**/.cache/**</exclude>
<exclude>deploy/**</exclude>
<exclude>**/.project</exclude>
</excludes>
</configuration>
</plugin>
</plugins>
</pluginManagement>
<plugins>
<!-- plugin to build the tar.gz file filled with examples -->
<plugin>
<artifactId>maven-assembly-plugin</artifactId>
<version>2.3</version>
<configuration>
<descriptors>
<descriptor>src/assembly/bin.xml</descriptor>
</descriptors>
<descriptorRefs>
<descriptorRef>jar-with-dependencies</descriptorRef>
</descriptorRefs>
</configuration>
<executions>
<execution>
<id>make-assembly</id>
<phase>package</phase>
<goals>
<goal>single</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
<dependencyManagement>
<dependencies>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-api</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-core_2.10</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-log4j</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-shell</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-yarn_2.10</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kv_2.10</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kv-rocksdb_2.10</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.samza</groupId>
<artifactId>samza-kafka_2.10</artifactId>
<version>${samza.version}</version>
</dependency>
<dependency>
<groupId>org.apache.kafka</groupId>
<artifactId>kafka_2.10</artifactId>
<version>0.8.2.0</version>
</dependency>
<dependency>
<groupId>org.schwering</groupId>
<artifactId>irclib</artifactId>
<version>1.10</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-api</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.slf4j</groupId>
<artifactId>slf4j-log4j12</artifactId>
<version>1.6.2</version>
</dependency>
<dependency>
<groupId>org.codehaus.jackson</groupId>
<artifactId>jackson-jaxrs</artifactId>
<version>1.8.5</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-hdfs</artifactId>
<version>2.6.0</version>
</dependency>
</dependencies>
</dependencyManagement>
</project>
The basic code of a job is this one:
package xxxx;
import java.util.Map;
import org.apache.samza.config.Config;
import org.apache.samza.system.IncomingMessageEnvelope;
import org.apache.samza.system.OutgoingMessageEnvelope;
import org.apache.samza.system.SystemStream;
import org.apache.samza.task.MessageCollector;
import org.apache.samza.task.StreamTask;
import org.apache.samza.task.TaskContext;
import org.apache.samza.task.TaskCoordinator;
public class Redirect implements StreamTask {
private final SystemStream OUTPUT_STREAM = new SystemStream("kafka", "samzaout");
public void process(IncomingMessageEnvelope envelope,
MessageCollector collector,
TaskCoordinator coordinator)
{
String msg = (String)envelope.getMessage();
// Transformation
String outmsg = "xxx-" + msg + "-xxx";
collector.send(new OutgoingMessageEnvelope(OUTPUT_STREAM, outmsg));
}
}
Once you have it compiled you need to group it into a jar file and place it in a location accesible for all the samza nodes, web or hdfs.
Reference this from the properties file you will have to create to launch it. Look for examples in the porject web page.

Read that document some more, look at the hello-samza example some more, and if you deployed it to YARN read about it some more. All the answers you're looking for are there.
There are three jobs in hello-samza. Pick one and follow it, configuration, launching scripts etc.
Here's from the hello-samza page how to launch the wikipedia-feed job
deploy/samza/bin/run-job.sh --config-factory=org.apache.samza.config.factories.PropertiesConfigFactory --config-path=file://$PWD/deploy/samza/config/wikipedia-feed.properties
The properties file shows where the compiled job/task code is among other things. The source code for the wikipedia-feed job/task is here:
https://github.com/apache/samza-hello-samza/blob/master/src/main/java/samza/examples/wikipedia/task/WikipediaFeedStreamTask.java
Just modify this job, or copy and modify, to get yours going.

Related

Trouble adding camel-http4 to Maven Camel project in Eclipse

I'm trying to add http4 to my Camel project. According to the documentation it looks like I only need to add these two Maven dependencies using the Camel version. But I get an error from Eclipse:
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4</artifactId>
<version>${camel-version}</version>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4-starter</artifactId>
<version>${camel-version}</version>
</dependency>
But Eclipse give an error: "Missing artifact org.apache.camel:camel-http4:jar:3.11.1"
I don't see the dependency in the list. I tried using Maven > Update Project. I also tried closing and re-opening the project. I also tried adding these to another project and got the same thing. I'm not sure what I have wrong here.
Here is my pom
<?xml version="1.0" encoding="UTF-8"?>
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/maven-v4_0_0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>systems.petsuppliesplus</groupId>
<artifactId>email-processor</artifactId>
<packaging>jar</packaging>
<version>1.0-SNAPSHOT</version>
<name>A Camel Spring Boot Route</name>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
<spring.boot-version>2.5.3</spring.boot-version>
<surefire.plugin.version>3.0.0-M4</surefire.plugin.version>
<camel-version>3.11.1</camel-version>
</properties>
<dependencyManagement>
<dependencies>
<!-- Spring Boot BOM -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-dependencies</artifactId>
<version>${spring.boot-version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
<!-- Camel BOM -->
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-spring-boot-dependencies</artifactId>
<version>${camel-version}</version>
<type>pom</type>
<scope>import</scope>
</dependency>
</dependencies>
</dependencyManagement>
<dependencies>
<!-- Spring Boot -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<exclusions>
<exclusion>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-undertow</artifactId>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-actuator</artifactId>
</dependency>
<!-- Camel -->
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-spring-boot-starter</artifactId>
</dependency>
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-stream-starter</artifactId>
</dependency>
<!-- Test -->
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<scope>test</scope>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-test-spring-junit5</artifactId>
<scope>test</scope>
</dependency>
<!-- Additions -->
<!-- For receiving JMS messages from Artemis -->
<dependency>
<groupId>org.apache.activemq</groupId>
<artifactId>artemis-jms-client</artifactId>
</dependency>
<dependency>
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-jms-starter</artifactId>
</dependency>
<dependency>
<groupId>org.messaginghub</groupId>
<artifactId>pooled-jms</artifactId>
</dependency>
<dependency>
<groupId>javax.json</groupId>
<artifactId>javax.json-api</artifactId>
</dependency>
<!-- Calling HTTP (REST) -->
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4</artifactId>
<version>${camel-version}</version>
</dependency>
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4-starter</artifactId>
<version>${camel-version}</version>
</dependency>
<!-- Model Object Traslation -->
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-jackson</artifactId>
<version>${camel-version}</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.8.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>${spring.boot-version}</version>
<executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-surefire-plugin</artifactId>
<version>${surefire.plugin.version}</version>
</plugin>
</plugins>
</build>
</project>
I tried re-indexing the local repository, and a few other Eclipse tricks to try causing things to refresh, but they don't seem to help. I also tried using camel-http. That finds the jar, but not for the starter.
<!-- This one works -->
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http</artifactId>
<version>${camel-version}</version>
</dependency>
<!-- But this one still give the missing artifact error -->
<dependency>
<groupId>org.apache.camel</groupId>
<artifactId>camel-http-starter</artifactId>
<version>${camel-version}</version>
</dependency>
camel-http4 was renamed in Camel 3.x, you can find this in the migration guide.
The reason is that older http-clients than 4.x were dropped for Camel 3.x, so camel-http4 is now the only one and therefore simply camel-http.
What was referenced in Camel 2.x as
<groupId>org.apache.camel</groupId>
<artifactId>camel-http4-starter</artifactId>
has changed in Camel 3.x to
<groupId>org.apache.camel.springboot</groupId>
<artifactId>camel-http-starter</artifactId>
Notice the "springboot" in the groupId and simply "http" in the artifactId.

Spring boot REST application not working without Build Path in Eclipse

I'm facing an issue where our spring boot application will only run if a subproject is included. A rough project sketch:
Backend
This is where the Main Class is located. This project also contains the spring repository which are exposed via REST, the filters and REST configuration. The data itself is included in the backend-module project.
backend-module
This is where the actual Java Classes which hold the data are located. They are used in conjunction with hibernate.
Now the application works fine unless I remove the backend-module from the Java Build Path in the eclipse Project preferences. But if I remove the reference the application launch will fail but not for a missing component from the backendmodule but for missing spring boot components:
Caused by: java.lang.NoClassDefFoundError: org/springframework/data/repository/support/RepositoryInvokerFactory
The pom.xml files of the project are almost the same.
I'll happily include all the information someone may need.
Thanks
EDIT 1:
The pom.xml of the Backend Project.
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<groupId>backend</groupId>
<artifactId>Backend</artifactId>
<version>1.0</version>
<description>Rest Backend</description>
<parent>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-parent</artifactId>
<version>1.2.5.RELEASE</version>
</parent>
<properties>
<!-- use UTF-8 for everything -->
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<project.reporting.outputEncoding>UTF-8</project.reporting.outputEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-io</artifactId>
<version>1.3.2</version>
</dependency>
<dependency>
<groupId>cis</groupId>
<artifactId>backend-module</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>4.2.12.Final</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-spatial</artifactId>
<version>4.0</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-tomcat</artifactId>
<version>1.2.5.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-web</artifactId>
<version>1.2.5.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-test</artifactId>
<version>1.2.5.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-jpa</artifactId>
<version>1.2.5.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-starter-data-rest</artifactId>
<version>1.2.5.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-rest-webmvc</artifactId>
<version>2.3.2.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-legacy</artifactId>
<version>1.0.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-commons-core</artifactId>
<version>1.4.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.2.12.Final</version>
</dependency>
<dependency>
<groupId>postgresql</groupId>
<artifactId>postgresql</artifactId>
<version>9.1-901-1.jdbc4</version>
</dependency>
<dependency>
<groupId>org.postgis</groupId>
<artifactId>postgis-jdbc</artifactId>
<version>1.5.2</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.17</version>
</dependency>
<dependency>
<groupId>org.antlr</groupId>
<artifactId>ST4</artifactId>
<version>4.0.8</version>
</dependency>
<dependency>
<groupId>jdom</groupId>
<artifactId>jdom</artifactId>
<version>1.1</version>
</dependency>
<dependency>
<groupId>com.fasterxml.jackson.dataformat</groupId>
<artifactId>jackson-dataformat-xml</artifactId>
<version>2.5.3</version>
</dependency>
<dependency>
<groupId>org.codehaus.woodstox</groupId>
<artifactId>woodstox-core-asl</artifactId>
<version>4.4.1</version>
</dependency>
<dependency>
<groupId>cis.adapter</groupId>
<artifactId>CISConnector</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>cis.adapter</groupId>
<artifactId>CISCore</artifactId>
<version>1.0</version>
</dependency>
<dependency>
<groupId>com.google</groupId>
<artifactId>caplibrary</artifactId>
<version>r11</version>
</dependency>
</dependencies>
<repositories>
<repository>
<id>org.jboss.repository.releases</id>
<name>JBoss Maven Release Repository</name>
<url>https://repository.jboss.org/nexus/content/repositories/releases</url>
</repository>
<repository>
<id>OSGEO GeoTools repo</id>
<url>http://download.osgeo.org/webdav/geotools</url>
</repository>
<repository>
<id>Hibernate Spatial repo</id>
<url>http://www.hibernatespatial.org/repository</url>
</repository>
</repositories>
<pluginRepositories>
<pluginRepository>
<id>spring-releases</id>
<name>Spring Releases</name>
<url>https://repo.spring.io/libs-release</url>
</pluginRepository>
</pluginRepositories>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<configuration>
<source>1.8</source>
<target>1.8</target>
</configuration>
</plugin>
<plugin>
<groupId>org.springframework.boot</groupId>
<artifactId>spring-boot-maven-plugin</artifactId>
<version>1.2.6.RELEASE</version>
<executions>
<execution>
<goals>
<goal>repackage</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>
Apparently the issue was fixed by dropping a lot of the version statements in the pom.xml files and upgrading Spring Boot to the latest version.
If I understood the hastily given explanation correctly our Spring boot release didn't have the right versions of some Spring components packed into it.
Sorry if someone has a similar issue and finds this answer insufficient.

spring data hive integration with hive template

I am trying to use spring data hadoop to integrate hive into my application and running into some issues. First thing I am not sure about is <hdp:hive-server host="some-other-host" port="10001" /> is this to connect to an existing hive server or to something like create a new hive server to then be able to connect to it. Secondly my configuration does not throws any errors so it does seems ok and even the hiveTemplate autowiring works fine too but when I execute a query I dont seem to get any response back. The application sort of gets stuck at that point.
here is the configuration
<hive-client-factory host="${hive-${env}.server}" port="${hive-${env}.port}" />
<hive-template />
and here is how im using it
log.debug("before hive query");
for(String result : hiveTemplate.query("show tables;")){
log.debug("=> " + result);
}
log.debug("after hive query");
all I see in log output is before hive query .. nothing happens after that. I would appreciate any help. Any ideas what I could be doing wrong.
Try 10000 as the port number.
Usually Thrift server is deployed with port number 10000. Check If your installation is using HiveServer2 or HiveServer. I was able to get Spring Batch workflows to work with HiveServer, but I have no yet succeeded with HiveServer2.
Ensure that you have HiveServer/HiveServer is up and running before ou run your program
If we are using with spring,please make sure all your dependency is compatible with spring version.After that give read and write permission using below commands.
hadoop fs -mkdir /tmp
hadoop fs -chmod a+w /tmp
hadoop fs -mkdir -p /user/hive/warehouse
hadoop fs -chmod a+w /user/hive/warehouse
I have used following version of dependency in pom.xml
<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
<modelVersion>4.0.0</modelVersion>
<artifactId>spring-hadoop-samples-hive</artifactId>
<name>Spring Hadoop Samples - Hive</name>
<parent>
<groupId>org.springframework.samples</groupId>
<artifactId>spring-hadoop-samples</artifactId>
<version>1.0.0.BUILD-SNAPSHOT</version>
<relativePath>../parent/pom.xml</relativePath>
</parent>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
<spring.hadoop.version>2.3.0.M1</spring.hadoop.version>
<hadoop.version>2.7.1</hadoop.version>
<hive.version>1.2.1</hive.version>
<!-- <hive.version>2.1.1</hive.version> -->
</properties>
<dependencies>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-hadoop</artifactId>
<version>${spring.hadoop.version}</version>
<exclusions>
<exclusion>
<groupId>org.springframework</groupId>
<artifactId>spring-context-support</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-jdbc</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-tx</artifactId>
<version>${spring.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hadoop</groupId>
<artifactId>hadoop-common</artifactId>
<version>${hadoop.version}</version>
<scope>compile</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-metastore</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-service</artifactId>
<version>${hive.version}</version>
</dependency>
<dependency>
<groupId>org.apache.thrift</groupId>
<artifactId>libfb303</artifactId>
<version>0.9.1</version>
</dependency>
<!-- runtime Hive deps start -->
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-common</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-jdbc</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-shims</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-serde</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
</dependency>
<dependency>
<groupId>org.apache.hive</groupId>
<artifactId>hive-contrib</artifactId>
<version>${hive.version}</version>
<scope>runtime</scope>
</dependency>
<!-- runtime Hive deps end -->
<dependency>
<groupId>org.codehaus.groovy</groupId>
<artifactId>groovy</artifactId>
<version>1.8.5</version>
<scope>runtime</scope>
</dependency>
</dependencies>
<repositories>
<repository>
<id>spring-milestone</id>
<url>http://repo.spring.io/libs-milestone</url>
</repository>
</repositories>
<build>
<plugins>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>appassembler-maven-plugin</artifactId>
<version>1.2.2</version>
<configuration>
<repositoryLayout>flat</repositoryLayout>
<configurationSourceDirectory>src/main/config</configurationSourceDirectory>
<copyConfigurationDirectory>true</copyConfigurationDirectory>
<!-- Extra JVM arguments that will be included in the bin scripts -->
<extraJvmArguments>-Xms512m -Xmx1024m -Dhive.version=${hive.version}</extraJvmArguments>
<programs>
<program>
<mainClass>org.springframework.samples.hadoop.hive.HiveApp</mainClass>
<name>hiveApp</name>
</program>
<program>
<mainClass>org.springframework.samples.hadoop.hive.HiveClientApp</mainClass>
<name>hiveClientApp</name>
</program>
<program>
<mainClass>org.springframework.samples.hadoop.hive.HiveAppWithApacheLogs</mainClass>
<name>hiveAppWithApacheLogs</name>
</program>
</programs>
</configuration>
<executions>
<execution>
<id>package</id>
<goals>
<goal>assemble</goal>
</goals>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-antrun-plugin</artifactId>
<executions>
<execution>
<id>config</id>
<phase>package</phase>
<configuration>
<tasks>
<copy todir="target/appassembler/data">
<fileset dir="data"/>
</copy>
</tasks>
</configuration>
<goals>
<goal>run</goal>
</goals>
</execution>
</executions>
</plugin>
</plugins>
</build>
</project>

What dependencies need to be added for jasperreport 5.0.1?

I upgraded jasper report version from 4.5.0 to 5.1.0. When i install my plugin it throws error like some dependency missing. I would like to configure JasperReports 5.1.0 with Maven.
Execution default of goal org.codehaus.mojo:jasperreports-maven-plugin:1.0-beta-2:compile-reports failed: Plugin org.codehaus.mojo:jasperreports-maven-plugin:1.0-beta-2 or one of its dependencies could not be resolved: Failure to find com.lowagie:itext:jar:2.1.7.js2
I am having two queries.
1) I want to know what are all the dependency that I have to add in pom to use jasper report 5.1.0.
2) I am using below plugin to compile my jrxml files to jasper files. I see that issue in this plugin. What could be the issue in this plugin. Should I have to add any mirror ?
<groupId>org.codehaus.mojo</groupId>
<artifactId>jasperreports-maven-plugin</artifactId>
<version>1.0-beta-2</version>
My complete pom is, There may be extra dependencies other that jasper report, I am using it for my internal purpose.
<?xml version="1.0"?>
<project xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd" xmlns="http://maven.apache.org/POM/4.0.0"
xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<groupId>com.test.plugins</groupId>
<artifactId>report-test-plugin</artifactId>
<version>2.2.1.1001-SNAPSHOT</version>
<packaging>jar</packaging>
<name>report-test-plugin</name>
<url>http://maven.apache.org</url>
<properties>
<project.build.sourceEncoding>UTF-8</project.build.sourceEncoding>
</properties>
<dependencies>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-plugin-api</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-project</artifactId>
<version>2.0.1</version>
</dependency>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-artifact</artifactId>
<version>${maven-artifact.version}</version>
</dependency>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-artifact-manager</artifactId>
<version>${maven-artifact-manager.version}</version>
</dependency>
<dependency>
<groupId>org.apache.maven</groupId>
<artifactId>maven-model</artifactId>
<version>${maven-model.version}</version>
</dependency>
<dependency>
<groupId>org.codehaus.plexus</groupId>
<artifactId>plexus-utils</artifactId>
<version>${plexus-utils.version}</version>
</dependency>
<dependency>
<groupId>org.apache.maven.wagon</groupId>
<artifactId>wagon-provider-api</artifactId>
<version>${wagon-provider-api.version}</version>
</dependency>
<dependency>
<groupId>dom4j</groupId>
<artifactId>dom4j</artifactId>
<version>${dom4j.version}</version>
</dependency>
<dependency>
<groupId>org.apache.maven.shared</groupId>
<artifactId>maven-plugin-testing-harness</artifactId>
<version>1.0-beta-1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>jmock</groupId>
<artifactId>jmock</artifactId>
<version>${jmock.version}</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>commons-configuration</groupId>
<artifactId>commons-configuration</artifactId>
<version>${commons-configuration.version}</version>
</dependency>
<dependency>
<groupId>commons-io</groupId>
<artifactId>commons-io</artifactId>
<version>2.1</version>
</dependency>
<dependency>
<groupId>org.json</groupId>
<artifactId>json</artifactId>
<version>${json.version}</version>
</dependency>
<!-- itext -->
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itextpdf</artifactId>
<version>5.1.2</version>
</dependency>
<dependency>
<groupId>com.itextpdf</groupId>
<artifactId>itext-xtra</artifactId>
<version>5.1.2</version>
</dependency>
<dependency>
<groupId>com.itextpdf.tool</groupId>
<artifactId>xmlworker</artifactId>
<version>1.1.0</version>
</dependency>
<dependency>
<groupId>org.xhtmlrenderer</groupId>
<artifactId>core-renderer</artifactId>
<version>R8</version>
</dependency>
<dependency>
<groupId>com.fasterxml</groupId>
<artifactId>classmate</artifactId>
<version>0.5.4</version>
</dependency>
<dependency>
<groupId>net.sourceforge.htmlcleaner</groupId>
<artifactId>htmlcleaner</artifactId>
<version>2.2</version>
</dependency>
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports</artifactId>
<version>5.1.0</version>
</dependency>
<dependency>
<groupId>com.lowagie</groupId>
<artifactId>itext</artifactId>
<version>2.1.7</version>
</dependency>
<dependency>
<groupId>org.codehaus.mojo</groupId>
<artifactId>jasperreports-maven-plugin</artifactId>
<version>1.0-beta-2</version>
<exclusions>
<exclusion>
<artifactId>plexus-container-default</artifactId>
<groupId>org.codehaus.plexus</groupId>
</exclusion>
<exclusion>
<groupId>jasperreports</groupId>
<artifactId>jasperreports</artifactId>
</exclusion>
</exclusions>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.9</version>
</dependency>
</dependencies>
<build>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<version>3.1</version>
<executions>
<execution>
<phase>validate</phase>
<goals>
<goal>compile</goal>
</goals>
<configuration>
<source>1.6</source>
<target>1.6</target>
</configuration>
</execution>
</executions>
</plugin>
<plugin>
<groupId>org.codehaus.mojo</groupId>
<artifactId>jasperreports-maven-plugin</artifactId>
<version>1.0-beta-2</version>
<executions>
<execution>
<phase>validate</phase>
<inherited>false</inherited>
<goals>
<goal>compile-reports</goal>
</goals>
<configuration>
<!-- define where is your jrxml file -->
<sourceDirectory>src\\main\\resources</sourceDirectory>
<sourceFileExt>.jrxml</sourceFileExt>
<compiler>net.sf.jasperreports.engine.design.JRJavacCompiler</compiler>
<!-- define where is the jasper file will be generated -->
<outputDirectory>src\\main\\resources</outputDirectory>
</configuration>
</execution>
</executions>
<dependencies>
<!-- Note this must be repeated here to pick up correct xml validation -->
<dependency>
<groupId>net.sf.jasperreports</groupId>
<artifactId>jasperreports</artifactId>
<version>5.1.0</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
</dependencies>
</plugin>
</plugins>
</build>
</project>
For anyone having similar issues when using the JasperReports 5.x maven dependencies:
The jasper guys run a public maven repository where they publish bug fixes to their third party dependencies. These fixed versions are not always present in the public maven repo. So when you have problems with dependencies try to add the repository http://jasperreports.sourceforge.net/maven2.
For me, the solution was adding
<dependency>
<groupId>com.lowagie</groupId>
<artifactId>itext</artifactId>
<version>2.1.7</version>
</dependency>
This way it was not looking for the .js2 version and could resolve the dependency.

Eclipse, junit, Hibernate An internal error occurred during: "Fetching children of Database"

When i'm trying to open hibernate perspective in Eclipse, i receive the above error, with the following stacktrace:
java.lang.NoSuchMethodError: org.slf4j.spi.LocationAwareLogger.log(Lorg/slf4j/Marker;Ljava/lang/String;ILjava/lang/String;[Ljava/lang/Object;Ljava/lang/Throwable;)V
at org.apache.commons.logging.impl.SLF4JLocationAwareLog.debug(SLF4JLocationAwareLog.java:133)
at org.hibernate.cfg.reveng.dialect.JDBCMetaDataDialect.getTables(JDBCMetaDataDialect.java:26)
at org.hibernate.cfg.reveng.JDBCReader.processTables(JDBCReader.java:476)
at org.hibernate.cfg.reveng.JDBCReader.readDatabaseSchema(JDBCReader.java:74)
at org.hibernate.eclipse.console.workbench.LazyDatabaseSchemaWorkbenchAdapter$2.execute(LazyDatabaseSchemaWorkbenchAdapter.java:126)
at org.hibernate.console.execution.DefaultExecutionContext.execute(DefaultExecutionContext.java:63)
at org.hibernate.console.ConsoleConfiguration.execute(ConsoleConfiguration.java:107)
at org.hibernate.eclipse.console.workbench.LazyDatabaseSchemaWorkbenchAdapter.readDatabaseSchema(LazyDatabaseSchemaWorkbenchAdapter.java:115)
at org.hibernate.eclipse.console.workbench.LazyDatabaseSchemaWorkbenchAdapter.getChildren(LazyDatabaseSchemaWorkbenchAdapter.java:65)
at org.hibernate.eclipse.console.workbench.BasicWorkbenchAdapter.fetchDeferredChildren(BasicWorkbenchAdapter.java:106)
at org.eclipse.ui.progress.DeferredTreeContentManager$1.run(DeferredTreeContentManager.java:235)
at org.eclipse.core.internal.jobs.Worker.run(Worker.java:53)
This is my pom.xml:
<?xml version="1.0" encoding="UTF-8"?>
<project
xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"
xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<modelVersion>4.0.0</modelVersion>
<repositories>
<repository>
<id>caf</id>
<name>caf-repo</name>
<url>http://artifactory.fao.org/artifactory/caf-release-local</url>
</repository>
</repositories>
<groupId>org.fao.fipdt</groupId>
<artifactId>fip-dt</artifactId>
<version>1.0-SNAPSHOT</version>
<packaging>war</packaging>
<name>FIP-Dt Tool</name>
<properties>
<org.springframework.version>3.2.0.RELEASE</org.springframework.version>
<javax.servlet.jstl.version>1.2</javax.servlet.jstl.version>
</properties>
<profiles>
<profile>
<id>development</id>
<activation>
<activeByDefault>true</activeByDefault>
</activation>
<properties>
...
</properties>
</profile>
</profiles>
<dependencies>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-context</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-core</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework.data</groupId>
<artifactId>spring-data-jpa</artifactId>
<version>1.0.1.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-test</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-webmvc</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-orm</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-tx</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.springframework</groupId>
<artifactId>spring-beans</artifactId>
<version>3.2.0.RELEASE</version>
</dependency>
<dependency>
<groupId>org.fao.caf</groupId>
<artifactId>caf-client</artifactId>
<version>3.3.3</version>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>servlet-api</artifactId>
<version>2.5</version>
</dependency>
<dependency>
<groupId>commons-dbcp</groupId>
<artifactId>commons-dbcp</artifactId>
<version>1.4</version>
</dependency>
<dependency>
<groupId>javax.servlet</groupId>
<artifactId>jstl</artifactId>
<version>${javax.servlet.jstl.version}</version>
</dependency>
<dependency>
<groupId>log4j</groupId>
<artifactId>log4j</artifactId>
<version>1.2.16</version>
</dependency>
<dependency>
<groupId>junit</groupId>
<artifactId>junit</artifactId>
<version>4.8.1</version>
<scope>test</scope>
</dependency>
<dependency>
<groupId>commons-logging</groupId>
<artifactId>commons-logging</artifactId>
<version>1.1.1</version>
</dependency>
<dependency>
<groupId>mysql</groupId>
<artifactId>mysql-connector-java</artifactId>
<version>5.1.21</version>
</dependency>
<dependency>
<groupId>commons-collections</groupId>
<artifactId>commons-collections</artifactId>
<version>20040616</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-core</artifactId>
<version>4.1.9.Final</version>
</dependency>
<dependency>
<groupId>org.hibernate</groupId>
<artifactId>hibernate-entitymanager</artifactId>
<version>4.1.9.Final</version>
</dependency>
</dependencies>
<build>
<finalName>fip-dt</finalName>
<plugins>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-compiler-plugin</artifactId>
<configuration>
<source>1.6</source>
<target>1.6</target>
<encoding>UTF8</encoding>
</configuration>
<inherited>true</inherited>
</plugin>
<plugin>
<groupId>org.apache.maven.plugins</groupId>
<artifactId>maven-eclipse-plugin</artifactId>
<version>2.9</version>
</plugin>
</plugins>
<resources>
<resource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
</includes>
</resource>
</resources>
<testResources>
<testResource>
<directory>src/test/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
</includes>
</testResource>
<testResource>
<directory>src/main/resources</directory>
<filtering>true</filtering>
<includes>
<include>**/*.xml</include>
</includes>
</testResource>
</testResources>
</build>
</project>
I looked through the site, and i'm aware that the problem is about sfl4j inclusion, but unfortunately i tried to exclude it, update it, add a dependency, but every try fails.
The version that i can see from dependency hierarchi is 1.6.1, and is the same version that was in the hibernate core pom (but in that one the scope is test.
The library contains that class and method, but i don't know why hibernate tools cannot find them.
I tried with Spring Tool Suite ide, and with a fresh eclipse juno installation.
I'm running out of ideas. :(
Review your project pom and eclipse configuration: you've two different versions of log4j/slf4j.
The problem is with two or more versions of the same jar in the classpath as indicated by David.
In this situation you may try these:
•Your hibernate tools plugin (< eclipse_directory >/plugins/) and your project maven repository (< user_directory >/.m2/repository/org/slf4j) have different SLF4J jar version. Maintain same version in both of these folders (read classpath). Keep the latest one in both of these and delete the older one. It's a dirty hack but an acceptable way to keep jar versions in-sync.
•Another (better) way, keep your data access objects in a separate project. Create a simple maven POJO project and automate code generation through hibernate tools. Import this project into the project where you're maintaining business logic. This is a standard design in big projects and, the good part, you don't have to suffer the version conflict.