Trying to Import data from an oracle database into a OrientDB graph using OETL.
I've noticed that on jdbc-drivers.json the URL for oracle driver is not working
{
"Oracle": {
"className": "oracle.jdbc.driver.OracleDriver",
## "className": "oracle.jdbc.driver.OracleDriver",
"url": "http://svn.ets.berkeley.edu/nexus/content/repositories/myberkeley/com/oracle/ojdbc7/12.1.0.1/ojdbc7-12.1.0.1.jar",
"version": "7-12.1.0.1",
"format": [
"jdbc:oracle:thin:#<HOST>:<PORT>:<SID>"
]
},
So i replace it with both
"url": "https://github.com/MHTaleb/ojdbc7/raw/master/ojdbc7.jar",
and also localy with
"url": "C:\Users\sviu\Desktop\orientdb-3.0.26\ojdbc7.jar",
I'm using the following configuration
{
"config": {
"log": "error"
},
"extractor" : {
"jdbc": {
"driver": "oracle.jdbc.OracleDriver",
## "driver": "oracle.jdbc.driver.OracleDriver",
"url": "jdbc:oracle:thin:#URL:1521:SID",
"userName": "DB_OWNER",
"userPassword": "DBPWD",
"query": "select * from LOT" }
},
"transformers" : [
{ "vertex": { "class": "Lot"} }
],
"loader" : {
"orientdb": {
"dbURL": "plocal:../databases/LDS",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true,
"dbType": "graph"
}
}
}
Tried both driver names and classname change combination in jdbc-drivers.json and lots.json
Always getting driver not found.
C:\Users\sviu\Desktop\orientdb-3.0.26\bin>oetl.bat lots.json
OrientDB etl v.3.0.26 - Veloce (build 39bd95d70410374ab1cf6770ef2feca26145b04f, branch 3.0.x) https://www.orientdb.com
Exception in thread "main" com.orientechnologies.orient.core.exception.OConfigurationException: Error on creating ETL processor
at com.orientechnologies.orient.etl.OETLProcessorConfigurator.parse(OETLProcessorConfigurator.java:153)
at com.orientechnologies.orient.etl.OETLProcessorConfigurator.parseConfigAndParameters(OETLProcessorConfigurator.java:92)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:116)
Caused by: com.orientechnologies.orient.core.exception.OConfigurationException: [JDBC extractor] JDBC Driver oracle.jdbc.OracleDriver not found
at com.orientechnologies.orient.etl.extractor.OETLJDBCExtractor.configure(OETLJDBCExtractor.java:69)
at com.orientechnologies.orient.etl.OETLProcessorConfigurator.configureComponent(OETLProcessorConfigurator.java:158)
at com.orientechnologies.orient.etl.OETLProcessorConfigurator.configureExtractor(OETLProcessorConfigurator.java:216)
at com.orientechnologies.orient.etl.OETLProcessorConfigurator.parse(OETLProcessorConfigurator.java:115)
... 2 more
Caused by: java.lang.ClassNotFoundException: oracle.jdbc.OracleDriver
at java.net.URLClassLoader.findClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
at java.lang.ClassLoader.loadClass(Unknown Source)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Unknown Source)
at com.orientechnologies.orient.etl.extractor.OETLJDBCExtractor.configure(OETLJDBCExtractor.java:67)
... 5 more
Any insight would be appreciated.
BR
Related
I'm doing my first steps in the kafka-connect area and things are not going as expected.
I created a Mongo sink connector (kafka to mongo), but it fails to consume the messages with the following error:
Task threw an uncaught and unrecoverable exception (org.apache.kafka.connect.runtime.WorkerTask)
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:178)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:104)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:469)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:228)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:196)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:184)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:234)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.connect.errors.DataException: Failed to deserialize data for topic test_topic to Avro:
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:114)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:489)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:128)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:162)
... 13 more
I don't wanna use any schema but for some reason it fails on the deserialize, probably because the schema config is missing.
My configurations:
{
"name": "kafka-to-mongo",
"config": {
"connector.class": "com.mongodb.kafka.connect.MongoSinkConnector",
"tasks.max": "1",
"topics": "test_topic",
"connection.uri": "mongodb://mongodb4:27017/test-db",
"database": "auto-resolve",
"value.converter.schemas.enable": "false",
"value.converter": "org.apache.kafka.connect.json.JsonConverter",
"collection": "test_collection",
"document.id.strategy": "com.mongodb.kafka.connect.sink.processor.id.strategy.PartialValueStrategy",
"document.id.strategy.partial.value.projection.list": "id",
"document.id.strategy.partial.value.projection.type": "AllowList",
"writemodel.strategy": "com.mongodb.kafka.connect.sink.writemodel.strategy.ReplaceOneBusinessKeyStrategy"
}
}
Example of a message in the topic:
{
"_id": "62a6e4d88c1e1e0011902616",
"type": "alert",
"key": "test444",
"timestamp": 1655104728,
"source_system": "api.test",
"tags": [
{
"type": "aaa",
"value": "bbb"
},
{
"type": "fff",
"value": "rrr"
}
],
"ack": "no",
"main": "nochange",
"all_ids": []
}
Any ideas? Thanks!
I setup a Kafka JDBC sink to send events to a PostgreSQL.
I wrote this simple producer that sends JSON with schema (avro) data to a topic as follows:
producer.py (kafka-python)
biometrics = {
"heartbeat": self.pulse, # integer
"oxygen": self.oxygen,# integer
"temprature": self.temprature, # float
"time": time # string
}
avro_value = {
"schema": open(BASE_DIR+"/biometrics.avsc").read(),
"payload": biometrics
}
producer.send("biometrics",
key="some_string",
value=avro_value
)
Value Schema:
{
"type": "record",
"name": "biometrics",
"namespace": "athlete",
"doc": "athletes biometrics"
"fields": [
{
"name": "heartbeat",
"type": "int",
"default": 0
},
{
"name": "oxygen",
"type": "int",
"default": 0
},
{
"name": "temprature",
"type": "float",
"default": 0.0
},
{
"name": "time",
"type": "string"
"default": ""
}
]
}
Connector config (without hosts, passwords etc)
{
"name": "jdbc_sink",
"connector.class": "io.aiven.connect.jdbc.JdbcSinkConnector",
"tasks.max": "1",
"key.converter": "org.apache.kafka.connect.storage.StringConverter ",
"value.converter": "io.confluent.connect.avro.AvroConverter",
"topics": "biometrics",
"insert.mode": "insert",
"auto.create": "true"
}
But my connector fails hard, with three errors and I am unable to spot the reason for any of them:
TL;DR; log Version
(Error 1) Caused by: org.apache.kafka.connect.errors.DataException: biometrics
(Error 2) Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
(Error 3) Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Full log
org.apache.kafka.connect.errors.ConnectException: Tolerance exceeded in error handler
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:206)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execute(RetryWithToleranceOperator.java:132)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertAndTransformRecord(WorkerSinkTask.java:498)
at org.apache.kafka.connect.runtime.WorkerSinkTask.convertMessages(WorkerSinkTask.java:475)
at org.apache.kafka.connect.runtime.WorkerSinkTask.poll(WorkerSinkTask.java:325)
at org.apache.kafka.connect.runtime.WorkerSinkTask.iteration(WorkerSinkTask.java:229)
at org.apache.kafka.connect.runtime.WorkerSinkTask.execute(WorkerSinkTask.java:201)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:185)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:235)
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264)
at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
Caused by: org.apache.kafka.connect.errors.DataException: biometrics
at io.confluent.connect.avro.AvroConverter.toConnectData(AvroConverter.java:98)
at org.apache.kafka.connect.storage.Converter.toConnectData(Converter.java:87)
at org.apache.kafka.connect.runtime.WorkerSinkTask.lambda$convertAndTransformRecord$1(WorkerSinkTask.java:498)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndRetry(RetryWithToleranceOperator.java:156)
at org.apache.kafka.connect.runtime.errors.RetryWithToleranceOperator.execAndHandleError(RetryWithToleranceOperator.java:190)
... 13 more
Caused by: org.apache.kafka.common.errors.SerializationException: Error deserializing Avro message for id -1
Caused by: org.apache.kafka.common.errors.SerializationException: Unknown magic byte!
Could someone help me understand those errors and the underlying reason?
The error is because you need to use the JSONConverter class w/ value.converter.schemas.enabled=true in your Connector since that was what was produced, but the schema payload is not an Avro schema represntation of the payload, so it might still fail with those changes alone...
If you want to actually send Avro, then use the AvroProducer in confluent-kafka library, which requires running the Schema Registry.
I am trying to execute the Elasticsearch DSL query in Spark 2.2 and Scala 2.11.8. The version of Elasticsearch is 2.4.4, while the version of Kibana that I use is 4.6.4.
This is the library that I use in Spark:
<dependency>
<groupId>org.elasticsearch</groupId>
<artifactId>elasticsearch-spark-20_2.11</artifactId>
<version>5.2.2</version>
</dependency>
This is my code:
val spark = SparkSession.builder()
.config("es.nodes","localhost")
.config("es.port",9200)
.config("es.nodes.wan.only","true")
.config("es.index.auto.create","true")
.appName("ES test")
.master("local[*]")
.getOrCreate()
val myquery = """{"query":
{"bool": {
"must": [
{
"has_child": {
"filter": {
...
}
}
}
]
}
}}"""
val df = spark.read.format("org.elasticsearch.spark.sql")
.option("query", myquery)
.option("pushdown", "true")
.load("myindex/items")
I provided the main corpus of the DSL query. I am getting the error:
java.lang.IllegalArgumentException: Failed to parse query: {"query":
Initially, I was thinking that the problem consists in the version of Elasticsearch. As far as I understand from this GitHub, the version 4 of Elasticsearch is not supported.
However, if I run the same code with a simple query, it correctly retrieves records from Elasticsearch.
var df = spark.read
.format("org.elasticsearch.spark.sql")
.option("es.query", "?q=public:*")
.load("myindex/items")
Therefore I assume that the issue is not related to the version, but it's rather related to the way how I represent the query.
This query works fine with cURL, but maybe it should be somehow updated before passing it to Spark?
Full error stacktrace:
Previous exception in task: Failed to parse query: {"query":
{"bool": {
"must": [
{
"has_child": {
"filter": {
"bool": {
"must": [
{
"term": {
"project": 579
}
},
{
"terms": {
"status": [
0,
1,
2
]
}
}
]
}
},
"type": "status"
}
},
{
"has_child": {
"filter": {
"bool": {
"must": [
{
"term": {
"project": 579
}
},
{
"terms": {
"entity": [
4634
]
}
}
]
}
},
"type": "annotation"
}
},
{
"term": {
"project": 579
}
},
{
"range": {
"publication_date": {
"gte": "2017/01/01",
"lte": "2017/04/01",
"format": "yyyy/MM/dd"
}
}
},
{
"bool": {
"should": [
{
"terms": {
"typology": [
"news",
"blog",
"forum",
"socialnetwork"
]
}
},
{
"terms": {
"publishing_platform": [
"twitter"
]
}
}
]
}
}
]
}}
org.elasticsearch.hadoop.rest.query.QueryUtils.parseQuery(QueryUtils.java:59)
org.elasticsearch.hadoop.rest.RestService.createReader(RestService.java:417)
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.reader$lzycompute(AbstractEsRDDIterator.scala:49)
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.reader(AbstractEsRDDIterator.scala:42)
org.elasticsearch.spark.rdd.AbstractEsRDDIterator.hasNext(AbstractEsRDDIterator.scala:61)
scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
org.apache.spark.scheduler.Task.run(Task.scala:108)
org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
java.lang.Thread.run(Thread.java:745)
at org.apache.spark.TaskContextImpl.invokeListeners(TaskContextImpl.scala:138)
at org.apache.spark.TaskContextImpl.markTaskCompleted(TaskContextImpl.scala:116)
at org.apache.spark.scheduler.Task.run(Task.scala:118)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/01/22 21:43:12 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): org.apache.spark.util.TaskCompletionListenerException: Failed to parse query: {"query":
and this one:
8/01/22 21:47:40 WARN ScalaRowValueReader: Field 'cluster' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/22 21:47:40 WARN ScalaRowValueReader: Field 'project' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/22 21:47:40 WARN ScalaRowValueReader: Field 'client' is backed by an array but the associated Spark Schema does not reflect this;
(use es.read.field.as.array.include/exclude)
18/01/22 21:47:40 ERROR Executor: Exception in task 0.0 in stage 0.0 (TID 0)
scala.MatchError: Buffer(13473953) (of class scala.collection.convert.Wrappers$JListWrapper)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:276)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$StringConverter$.toCatalystImpl(CatalystTypeConverters.scala:275)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$CatalystTypeConverter.toCatalyst(CatalystTypeConverters.scala:103)
at org.apache.spark.sql.catalyst.CatalystTypeConverters$$anonfun$createToCatalystConverter$2.apply(CatalystTypeConverters.scala:379)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:61)
at org.apache.spark.sql.execution.RDDConversions$$anonfun$rowToRowRdd$1$$anonfun$apply$3.apply(ExistingRDD.scala:58)
at scala.collection.Iterator$$anon$11.next(Iterator.scala:409)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIterator.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$8$$anon$1.hasNext(WholeStageCodegenExec.scala:395)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:234)
at org.apache.spark.sql.execution.SparkPlan$$anonfun$2.apply(SparkPlan.scala:228)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.RDD$$anonfun$mapPartitionsInternal$1$$anonfun$apply$25.apply(RDD.scala:827)
at org.apache.spark.rdd.MapPartitionsRDD.compute(MapPartitionsRDD.scala:38)
at org.apache.spark.rdd.RDD.computeOrReadCheckpoint(RDD.scala:323)
at org.apache.spark.rdd.RDD.iterator(RDD.scala:287)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:335)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)
18/01/22 21:47:40 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, localhost, executor driver): scala.MatchError: Buffer(13473953) (of class scala.collection.convert.Wrappers$JListWrapper)
So the error says
Caused by: org.codehaus.jackson.JsonParseException: Unexpected character (':' (code 58)):
at [Source: java.io.StringReader#76aeea7a; line: 2, column: 33]
and if you look at your query
val myquery = """{"query":
"bool": {
you'll see that this probably maps to the : just after "bool" and obviously what you have is an invalid JSON. To make it clear I just re-format it as
{"query": "bool": { ...
Most probably you forgot { after "query" and the matching } at the end. Compare this to an example at official docs.
{
"query": {
"bool" : {
"must" : {
Facing problem in snappy ingestion to druid. Things start break after org.apache.hadoop.mapred.LocalJobRunner - map task executor complete. Its able to fetch the input file.
My specs json file -
{
"hadoopCoordinates": "org.apache.hadoop:hadoop-client:2.6.0",
"spec": {
"dataSchema": {
"dataSource": "apps_searchprivacy",
"granularitySpec": {
"intervals": [
"2017-01-23T00:00:00.000Z/2017-01-23T01:00:00.000Z"
],
"queryGranularity": "HOUR",
"segmentGranularity": "HOUR",
"type": "uniform"
},
"metricsSpec": [
{
"name": "count",
"type": "count"
},
{
"fieldName": "event_value",
"name": "event_value",
"type": "longSum"
},
{
"fieldName": "landing_impression",
"name": "landing_impression",
"type": "longSum"
},
{
"fieldName": "user",
"name": "DistinctUsers",
"type": "hyperUnique"
},
{
"fieldName": "cost",
"name": "cost",
"type": "doubleSum"
}
],
"parser": {
"parseSpec": {
"dimensionsSpec": {
"dimensionExclusions": [
"landing_page",
"skip_url",
"ua",
"user_id"
],
"dimensions": [
"t3",
"t2",
"t1",
"aff_id",
"customer",
"evt_id",
"install_date",
"install_week",
"install_month",
"install_year",
"days_since_install",
"months_since_install",
"weeks_since_install",
"success_url",
"event",
"chrome_version",
"value",
"event_label",
"rand",
"type_tag_id",
"channel_name",
"cid",
"log_id",
"extension",
"os",
"device",
"browser",
"cli_ip",
"t4",
"t5",
"referal_url",
"week",
"month",
"year",
"browser_version",
"browser_name",
"landing_template",
"strvalue",
"customer_group",
"extname",
"countrycode",
"issp",
"spdes",
"spsc"
],
"spatialDimensions": []
},
"format": "json",
"timestampSpec": {
"column": "time_stamp",
"format": "yyyy-MM-dd HH:mm:ss"
}
},
"type": "hadoopyString"
}
},
"ioConfig": {
"inputSpec": {
"dataGranularity": "hour",
"filePattern": ".*\\..*",
"inputPath": "hdfs://c8-auto-hadoop-service-1.srv.media.net:8020/data/apps_test_output",
"pathFormat": "'ts'=yyyyMMddHH",
"type": "granularity"
},
"type": "hadoop"
},
"tuningConfig": {
"ignoreInvalidRows": "true",
"type": "hadoop",
"useCombiner": "false"
}
},
"type": "index_hadoop"
}
Error Getting
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - (EQUATOR) 0 kvi 26214396(104857584)
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - mapreduce.task.io.sort.mb: 100
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - soft limit at 83886080
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - bufstart = 0; bufvoid = 104857600
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - kvstart = 26214396; length = 6553600
2017-02-03T14:39:50,738 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - Map output collector class = org.apache.hadoop.mapred.MapTask$MapOutputBuffer
2017-02-03T14:39:50,847 INFO [LocalJobRunner Map Task Executor #0] org.apache.hadoop.mapred.MapTask - Starting flush of map output
2017-02-03T14:39:50,849 INFO [Thread-22] org.apache.hadoop.mapred.LocalJobRunner - map task executor complete.
2017-02-03T14:39:50,850 WARN [Thread-22] org.apache.hadoop.mapred.LocalJobRunner - job_local233667772_0001
java.lang.Exception: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462) ~[hadoop-mapreduce-client-common-2.6.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522) [hadoop-mapreduce-client-common-2.6.0.jar:?]
Caused by: java.lang.UnsatisfiedLinkError: org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy()Z
at org.apache.hadoop.util.NativeCodeLoader.buildSupportsSnappy(Native Method) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.io.compress.SnappyCodec.checkNativeCodeLoaded(SnappyCodec.java:63) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.io.compress.SnappyCodec.getDecompressorType(SnappyCodec.java:192) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.io.compress.CodecPool.getDecompressor(CodecPool.java:176) ~[hadoop-common-2.6.0.jar:?]
at org.apache.hadoop.mapreduce.lib.input.LineRecordReader.initialize(LineRecordReader.java:90) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapreduce.lib.input.DelegatingRecordReader.initialize(DelegatingRecordReader.java:84) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapred.MapTask$NewTrackingRecordReader.initialize(MapTask.java:545) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:783) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341) ~[hadoop-mapreduce-client-core-2.6.0.jar:?]
at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243) ~[hadoop-mapreduce-client-common-2.6.0.jar:?]
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511) ~[?:1.8.0_121]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) ~[?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) ~[?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) ~[?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) ~[?:1.8.0_121]
2017-02-03T14:39:51,130 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Job job_local233667772_0001 failed with state FAILED due to: NA
2017-02-03T14:39:51,139 INFO [task-runner-0-priority-0] org.apache.hadoop.mapreduce.Job - Counters: 0
2017-02-03T14:39:51,143 INFO [task-runner-0-priority-0] io.druid.indexer.JobHelper - Deleting path[var/druid/hadoop-tmp/apps_searchprivacy/2017-02-03T143903.262Z_bb7a812bc0754d4aabcd4bc103ed648a]
2017-02-03T14:39:51,158 ERROR [task-runner-0-priority-0] io.druid.indexing.overlord.ThreadPoolTaskRunner - Exception while running task[HadoopIndexTask{id=index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z, type=index_hadoop, dataSource=apps_searchprivacy}]
java.lang.RuntimeException: java.lang.reflect.InvocationTargetException
at com.google.common.base.Throwables.propagate(Throwables.java:160) ~[guava-16.0.1.jar:?]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:204) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask.run(HadoopIndexTask.java:208) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:436) [druid-indexing-service-0.9.2.jar:0.9.2]
at io.druid.indexing.overlord.ThreadPoolTaskRunner$ThreadPoolTaskRunnerCallable.call(ThreadPoolTaskRunner.java:408) [druid-indexing-service-0.9.2.jar:0.9.2]
at java.util.concurrent.FutureTask.run(FutureTask.java:266) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142) [?:1.8.0_121]
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617) [?:1.8.0_121]
at java.lang.Thread.run(Thread.java:745) [?:1.8.0_121]
Caused by: java.lang.reflect.InvocationTargetException
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
Caused by: com.metamx.common.ISE: Job[class io.druid.indexer.IndexGeneratorJob] failed!
at io.druid.indexer.JobHelper.runJobs(JobHelper.java:369) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexer.HadoopDruidIndexerJob.run(HadoopDruidIndexerJob.java:94) ~[druid-indexing-hadoop-0.9.2.jar:0.9.2]
at io.druid.indexing.common.task.HadoopIndexTask$HadoopIndexGeneratorInnerProcessing.runTask(HadoopIndexTask.java:261) ~[druid-indexing-service-0.9.2.jar:0.9.2]
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) ~[?:1.8.0_121]
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) ~[?:1.8.0_121]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[?:1.8.0_121]
at java.lang.reflect.Method.invoke(Method.java:498) ~[?:1.8.0_121]
at io.druid.indexing.common.task.HadoopTask.invokeForeignLoader(HadoopTask.java:201) ~[druid-indexing-service-0.9.2.jar:0.9.2]
... 7 more
2017-02-03T14:39:51,165 INFO [task-runner-0-priority-0] io.druid.indexing.overlord.TaskRunnerUtils - Task [index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z] status changed to [FAILED].
2017-02-03T14:39:51,168 INFO [task-runner-0-priority-0] io.druid.indexing.worker.executor.ExecutorLifecycle - Task completed with status: {
"id" : "index_hadoop_apps_searchprivacy_2017-02-03T14:39:03.257Z",
"status" : "FAILED",
"duration" : 43693
}
It seems that jvm can't load native shared library (like .dll or .so), check is it available on machine(s) running the task, and if so check is its dir on the classpath of the jvm.
I have installed OrientDB V 2.1.11 on a Mac - El Capitan.
I am following the instructions as per the OrientDB documentation.
http://orientdb.com/docs/last/Import-from-DBMS.html
When I run the oetl.sh I get a null pointer exception. I assume it is connecting to the Oracle instance.
Json config:
{
"config": {
"log": "error"
},
"extractor" : {
"jdbc": { "driver": "oracle.jdbc.OracleDriver",
"url": "jdbc:oracle:thin:#<dbUrl>:1521:<dbSid>",
"userName": "username",
"userPassword": "password",
"query": "select sold_to_party_nbr from customer" }
},
"transformers" : [
{ "vertex": { "class": "Company"} }
],
"loader" : {
"orientdb": {
"dbURL": "plocal:../databases/BetterDemo",
"dbUser": "admin",
"dbPassword": "admin",
"dbAutoCreate": true
}
}
}
Error:
sharon.oconnor$ ./oetl.sh ../loadFromOracle.json
OrientDB etl v.2.1.11 (build UNKNOWN#rddb5c0b4761473ae9549c3ac94871ab56ef5af2c; 2016-02-15 10:49:20+0000) www.orientdb.com
Exception in thread "main" java.lang.NullPointerException
at com.orientechnologies.orient.etl.transformer.OVertexTransformer.begin(OVertexTransformer.java:53)
at com.orientechnologies.orient.etl.OETLPipeline.begin(OETLPipeline.java:72)
at com.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OETLProcessor.java:465)
at com.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.java:269)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:116)
The data in Oracle looks like this:
0000281305
0000281362
0000281378
0000281381
0000281519
0000281524
0000281563
0000281566
0000281579
0000281582
0000281623
0000281633
I have created a Company class with a sold_to_party_nbr string property in the BetterDemo database.
How can I degbug further to figure out what is wrong?