How to speed up writing into Impala from Talend - talend

I'm using Talend Open Studio for Big Data (7.3.1), and I write files from various sources to Cloudera Impala (Cloudera QuickStart 5.13) but that takes too much time and writes only ~3300 rows/s (take a look at the pictures).
Is there way to raise writing to ~10000-100000 rows/s or even greater?
Am i using wrong approach for the load?
Or do i need to configure Impala/Talend better?
Any advice is welcome!
UPDATE
I install JDBC Impala driver:
But OutputFile looks like it not configured for Impala:
Error:
Exception in component tDBOutput_1 (db_2_impala)
org.talend.components.api.exception.ComponentException: UNEXPECTED_EXCEPTION:{message=[Cloudera]ImpalaJDBCDriver ERROR processing query/statement. Error Code: 0, SQL state: TStatus(statusCode:ERROR_STATUS, sqlState:HY000, errorMessage:AnalysisException: Impala does not support modifying a non-Kudu table: algebra_db.source_data_textfile_2
), Query: DELETE FROM algebra_db.source_data_textfile_2.} at org.talend.components.jdbc.CommonUtils.newComponentException(CommonUtils.java:583)

Related

Hibernate fail to register REF_CURSOR parameter

I am new to Hibernate. Using the 5.2.10 FINAL version connecting to Oracle 11g using the Oracle10gDialect with JPA 2.1 and ojdbc8.jar
I try to access a simple stored procedure taking a String input parameter and output a Oracle SYS_REFCURSOR.
StoredProcedureQuery call = session.createStoredProcedureCall("sp_get_profile");
call.registerStoredProcedureParameter(1, String.class, ParameterMode.IN);
call.registerStoredProcedureParameter(2, Class.class, ParameterMode.REF_CURSOR);
call.execute();
An exception occur when I access the function
ERROR SqlExceptionHelper Invalid column type: 2012
DatabaseException::error=[Error registering REF_CURSOR parameter [2]]
I try to write a simple program connecting to DB with the Oracle Driver only. I will have the same error if I register Types.REF_CUSOR as the output parameter to CallableStatement.
cs.registerOutParameter(2, Types.REF_CURSOR);
And the problem can be solved by changing to OracleTypes
cs.registerOutParameter(2, OracleTypes.CURSOR);
Anyone know what is wrong? I need to fall back to use the traditional SQL programming if I cannot get the stored procedure access success. . . please help.
Finally got it work, I should check the Oracle JDBC document first before implementation.
ojdbc8 should be good for Oracle 12c + JDK8 + JPA2.1 with Oracle12cDialect.
For Oracle 11g, need to use ojdbc6.jar
I solved that issue by debugging OJDBC and Hibernate, the building from sources a patched Hibernate CORE library (5.4.15-Final).
I have patched the method ExtractedDatabaseMetaDataImpl.supportsRefCursors which returns always false.

JanusGraph DynamoDB backend exceptions when committing data to the database

Hello: I am using JanusGraph with DynamoDB example from https://github.com/awslabs/dynamodb-janusgraph-storage-backend
Also, I am connecting to JanusGraph using Spark - Scala - Gremlin Scala framework. Everything thing works when I used Cassandra as the backend, but when I switch to using DynamoDB, I start getting backend exception errors.
My conf looks like this
val conf = new BaseConfiguration
conf.setProperty("gremlin.graph","org.janusgraph.core.JanusGraphFactory")
conf.setProperty("storage.write-time","1 ms")
conf.setProperty("storage.read-time","1 ms")
conf.setProperty("storage.backend","com.amazon.janusgraph.diskstorage.dynamodb.DynamoDBStoreManager")
conf.setProperty("storage.dynamodb.client.signing-region","us-east-1")
conf.setProperty("storage.dynamodb.client.endpoint","http://127.0.0.1:8000")
val graph = JanusGraphFactory.open(conf)
I can connect DynamoDB fine, but when I start to insert data, I run into backend exceptions.
Below is part of the error log
ERROR org.janusgraph.graphdb.database.StandardJanusGraph - Could not commit transaction [1] due to storage exception in system-commit
org.janusgraph.core.JanusGraphException: Could not execute operation due to backend exception
at org.janusgraph.diskstorage.util.BackendOperation.execute(BackendOperation.java:57)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.persist(CacheTransaction.java:95)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.flushInternal(CacheTransaction.java:143)
at org.janusgraph.diskstorage.keycolumnvalue.cache.CacheTransaction.commit(CacheTransaction.java:200)
at org.janusgraph.diskstorage.BackendTransaction.commit(BackendTransaction.java:150)
at org.janusgraph.graphdb.database.StandardJanusGraph.commit(StandardJanusGraph.java:703)
at org.janusgraph.graphdb.transaction.StandardJanusGraphTx.commit(StandardJanusGraphTx.java:1363)
at org.janusgraph.graphdb.tinkerpop.JanusGraphBlueprintsGraph$GraphTransaction.doCommit(JanusGraphBlueprintsGraph.java:272)
at org.apache.tinkerpop.gremlin.structure.util.AbstractTransaction.commit(AbstractTransaction.java:105)
at $line81.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1$$anonfun$apply$1.apply(:84)
at $line81.$read$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$iw$$anonfun$1$$anonfun$apply$1.apply(:80)
Any idea what is going on here. I am pretty new to DynamoDB. This was working fine in Cassandra
Why do you know you are connected? I think you must provide credentials into your config. For example:
conf.setProperty("storage.dynamodb.client.credentials.class-name", "com.amazonaws.auth.BasicAWSCredentials")
conf.setProperty("storage.dynamodb.client.credentials.constructor-args", "ACCESS_KEY,SECRET_KEY")

export data from mongo to hive

my input: a collection("demo1") in mongo db (version 3.4.4 )
my output : my data imported in a database in hive("demo2") (version 1.2.1.2.3.4.7-4)
purpose : create a connector between mongo and hive
Error:
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON
I tried 2 solutions following those steps (but the error remains):
1) I create a local collection in mongo (via robomongo) connected to docker
2) I upload those version of jars and add it in hive
ADD JAR /home/.../mongo-hadoop-hive-2.0.2.jar;
ADD JAR /home/.../mongo-hadoop-core-2.0.2.jar;
ADD JAR /home/.../mongo-java-driver-3.4.2.jar;
Unfortunately the error doesn't change; so I upload those version, I hesitate in choosing right version for my export, so I try this:
ADD JAR /home/.../mongo-hadoop-hive-1.3.0.jar;
ADD JAR /home/.../mongo-hadoop-core-1.3.0.jar;
ADD JAR /home/.../mongo-java-driver-2.13.2.jar;
3) I create an external table
CREATE EXTERNAL TABLE demo2
(
id INT,
name STRING,
password STRING,
email STRING
)
STORED BY 'com.mongodb.hadoop.hive.MongoStorageHandler'
WITH
SERDEPROPERTIES('mongo.columns.mapping'='{"id":"_id","name":"name","password":"password","email":"email"}')
TBLPROPERTIES('mongo.uri'='mongodb://localhost:27017/local.demo1');
Error returned in hive :
Execution Error, return code 1 from org.apache.hadoop.hive.ql.exec.DDLTask. com/mongodb/util/JSON
How can I resolve this problem?
Copying the correct jar files (mongo-hadoop-core-2.0.2.jar, mongo-hadoop-hive-2.0.2.jar, mongo-java-driver-3.2.2.jar) on ALL the nodes of the cluster did the trick for me.
Other points to take care about:
Follow all steps mentioned here religiously - https://github.com/mongodb/mongo-hadoop/wiki/Hive-Usage#installation
Adhere to the requirements given here - https://github.com/mongodb/mongo-hadoop#requirements
Other useful links
https://github.com/mongodb/mongo-hadoop/wiki/FAQ#i-get-a-classnotfoundexceptionnoclassdeffounderror-when-using-the-connector-what-do-i-do
https://groups.google.com/forum/#!topic/mongodb-user/xMVoTSePgg0

DB2 exception: Cannot create PoolableConnectionFactory SQLCODE=-142,

I keep having the following error in MobileFirst Platform 6.3:
Runtime: org.apache.commons.dbcp.SQLNestedException: Cannot create
PoolableConnectionFactory (DB2 SQL Error: SQLCODE=-142,
SQLSTATE=42612, SQLERRMC=null, DRIVER=4.19.26)
This is my adapter code:
var test2 = WL.Server.createSQLStatement("SELECT * FROM WSDIWC.WBPTRR1");
function getCEID(cnum) {
return WL.Server.invokeSQLStatement({
preparedStatement : test2,
parameters : []
});
}
And adapter XML:
<connectivity>
<connectionPolicy xsi:type="sql:SQLConnectionPolicy">
<!-- Example for using a JNDI data source, replace with actual data source
name -->
<!-- <dataSourceJNDIName>${training-jndi-name}</dataSourceJNDIName> -->
<!-- Example for using MySQL connector, do not forget to put the MySQL
connector library in the project's lib folder -->
<dataSourceDefinition>
<driverClass>com.ibm.db2.jcc.DB2Driver</driverClass>
<url>jdbc:db2://***</url>
<user>**</user>
<password>**</password>
</dataSourceDefinition>
</connectionPolicy>
</connectivity>
I have remove the url, user and password.
Hope you help me out on clarification about the current problem.
I already know that the sql is not accepted since it's just a simple query.
I have also research about z/OS DB2 that it has issue with same error code sqlcode=-142. http://answers.splunk.com/answers/117024/splunk-db-connect-db2.html
While you say that this is a "simple query", the exception error code mentions the following:
-142
THE SQL STATEMENT IS NOT SUPPORTED
Explanation
An SQL statement was detected that is not supported by the database.
The statement might be valid for other IBM® relational database
products or it might be valid in another context. For example,
statements such as VALUES and SIGNAL or RESIGNAL SQLSTATE can be used
only in certain contexts, such as in a trigger body or in an SQL
Procedure.
System action
The statement cannot be processed.
Programmer response
Change the syntax of the SQL statement or remove the statement from
the program.
You should review the DB2 SQL guidelines for how to achieve what you want to achieve, and also explain that in the question if you'd like further assistance. For example, are you sure "WSDIWC.WBPTRR1" is actually available?
I encountered this same problem with JDBC connections to mainframe DB2 in MobileFirst 6.3. Connections to DB2 LUW worked fine. It appears that default pool validationQuery is valid for DB2 LUW but not DB2 z/OS.
You can work around the bug by doing the data source configuration in the Liberty profile server.xml. From the Eclipse Servers view, expand MobileFirst Development Server and edit the Server Configuration. Add the driver and data source there; for example:
<library id="db2jcc">
<fileset dir="whereever" includes="db2jcc4.jar db2jcc_license_cisuz.jar"/>
</library>
<dataSource id="db2" jndiName="jdbc/db2">
<jdbcDriver libraryRef="db2jcc"/>
<properties.db2.jcc databaseName="mydb" portNumber="5021"
serverName="myserver" user="myuser" password="mypw" />
</dataSource>
Then reference it in your adapter XML under connectionPolicy:
<dataSourceJNDIName>jdbc/db2</dataSourceJNDIName>
A benefit of configuring data sources in server.xml (vs the adapter XML) is you have access to all data source, JDBC, and JCC properties. So if the connection pool gives you other problems, you can customize it or switch to another data source type, such as type="javax.sql.DataSource".

iReport designer 4.5.1 /4.6.0 cannot interact with Hive

I have followed the instructions from here and installed the updated plugin. The error has become:
Query error
Message: net.sf.jasperreports.engine.JRException:
Error executing SQL statement for : null Level: SEVERE Stack Trace:
Error executing SQL statement for : null com.jaspersoft.hadoop.hive.HiveFieldsProvider.getFields(HiveFieldsProvider.java:113)
com.jaspersoft.ireport.hadoop.hive.designer.HiveFieldsProvider.getFields(HiveFieldsProvider.java:32)
com.jaspersoft.ireport.hadoop.hive.connection.HiveConnection.readFields(HiveConnection.java:154)
com.jaspersoft.ireport.designer.wizards.ConnectionSelectionWizardPanel.validate(ConnectionSelectionWizardPanel.java:146)
org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1357)
org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
After downgrading to 4.5.0 the error has become (the connection is verified and I am able to query the table from hive):
Query error
Message: net.sf.jasperreports.engine.JRException: Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:14 Table not found 'panstats' Level:
SEVERE Stack Trace: Query returned non-zero code: 10, cause:
FAILED: Error in semantic analysis: Line 1:14 Table not found 'panstats'
com.jaspersoft.hadoop.hive.HiveFieldsProvider.getFields(HiveFieldsProvider.java:260)
com.jaspersoft.ireport.hadoop.hive.designer.HiveFieldsProvider.getFields(HiveFieldsProvider.java:32)
com.jaspersoft.ireport.hadoop.hive.connection.HiveConnection.readFields(HiveConnection.java:146)
com.jaspersoft.ireport.designer.wizards.ConnectionSelectionWizardPanel.validate(ConnectionSelectionWizardPanel.java:146)
org.openide.WizardDescriptor$7.run(WizardDescriptor.java:1357)
org.openide.util.RequestProcessor$Task.run(RequestProcessor.java:572)
org.openide.util.RequestProcessor$Processor.run(RequestProcessor.java:997)
I am using Hive 0.8.1 on OS X Lion 10.7.4.
Is your query as simple as select * from panstats? I suspect that the query is not the problem, but you'll want to confirm that first.
You could try querying that table from a tool like SQuirreL SQL. If that tool also cannot get the data, then it's probably a Hive issue. If it can... then it's probably an issue with iReport or the Hive plugin.
It sounds like Hive is not configured to share metadata. It uses the annoying default configuration with Derby, so outside connections don't get access to your panstats table. I came across this article about configuring Hive earlier this year. It documents using MySQL instead of derby. If that's indeed the problem, then it's just a Hive configuration issue. Following that article would solve things both for SQuirreL and for iReport.