error in accessing tables from hive using apache drill

error in accessing tables from hive using apache drill - plugins

I am trying to read the data from my table abc which is in hive using Drill. For that i have created hive storage plugin with the configuration mentioned below
{
"type": "hive",
"enabled": true,
"configProps": {
"hive.metastore.uris": "thrift://<ip>:<port>",
"fs.default.name": "hdfs://<ip>:<port>/",
"hive.metastore.sasl.enabled": "false",
"hive.server2.enable.doAs": "true",
"hive.metastore.execute.setugi": "true"
}
}
with this i am able to see the databases in hive, but when i try to access any table in the particular database
select * from hive.db.abc;
it throws the following error
org.apache.drill.common.exceptions.UserRemoteException: VALIDATION
ERROR: From line 1, column 15 to line 1, column 18: Object 'abc' not
found within 'hive.db' SQL Query null [Error Id:
b6c56276-6255-4b5b-a600-746dbc2f3d67 on centos2.example.com:31010]
(org.apache.calcite.runtime.CalciteContextException) From line 1,
column 15 to line 1, column 18: Object 'abc' not found within
'hive.db' sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
org.apache.calcite.sql.SqlUtil.newContextException():800
org.apache.calcite.sql.SqlUtil.newContextException():788
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4703
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():127
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2972
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2957
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3216
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.SqlSelect.validate():226
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
org.apache.drill.exec.planner.sql.SqlConverter.validate():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():630
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():202
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():174
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():146
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():84
org.apache.drill.exec.work.foreman.Foreman.runSQL():567
org.apache.drill.exec.work.foreman.Foreman.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748 Caused By
(org.apache.calcite.sql.validate.SqlValidatorException) Object 'abc'
not found within 'hive.db'
sun.reflect.NativeConstructorAccessorImpl.newInstance0():-2
sun.reflect.NativeConstructorAccessorImpl.newInstance():62
sun.reflect.DelegatingConstructorAccessorImpl.newInstance():45
java.lang.reflect.Constructor.newInstance():423
org.apache.calcite.runtime.Resources$ExInstWithCause.ex():463
org.apache.calcite.runtime.Resources$ExInst.ex():572
org.apache.calcite.sql.SqlUtil.newContextException():800
org.apache.calcite.sql.SqlUtil.newContextException():788
org.apache.calcite.sql.validate.SqlValidatorImpl.newValidationError():4703
org.apache.calcite.sql.validate.IdentifierNamespace.resolveImpl():127
org.apache.calcite.sql.validate.IdentifierNamespace.validateImpl():177
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2972
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateFrom():2957
org.apache.drill.exec.planner.sql.SqlConverter$DrillValidator.validateFrom():267
org.apache.calcite.sql.validate.SqlValidatorImpl.validateSelect():3216
org.apache.calcite.sql.validate.SelectNamespace.validateImpl():60
org.apache.calcite.sql.validate.AbstractNamespace.validate():84
org.apache.calcite.sql.validate.SqlValidatorImpl.validateNamespace():947
org.apache.calcite.sql.validate.SqlValidatorImpl.validateQuery():928
org.apache.calcite.sql.SqlSelect.validate():226
org.apache.calcite.sql.validate.SqlValidatorImpl.validateScopedExpression():903
org.apache.calcite.sql.validate.SqlValidatorImpl.validate():613
org.apache.drill.exec.planner.sql.SqlConverter.validate():190
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateNode():630
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.validateAndConvert():202
org.apache.drill.exec.planner.sql.handlers.DefaultSqlHandler.getPlan():174
org.apache.drill.exec.planner.sql.DrillSqlWorker.getQueryPlan():146
org.apache.drill.exec.planner.sql.DrillSqlWorker.getPlan():84
org.apache.drill.exec.work.foreman.Foreman.runSQL():567
org.apache.drill.exec.work.foreman.Foreman.run():264
java.util.concurrent.ThreadPoolExecutor.runWorker():1149
java.util.concurrent.ThreadPoolExecutor$Worker.run():624
java.lang.Thread.run():748

You should upgrade to the newer Hive version. For Drill 1.13 it is Hive 2.3.2 version? Starting from Drill-1.13 Drill leverages 2.3.2 version of Hive client [1].
Supporting of Hive 3.0 version is upcoming [2].
Also please follow the following guide with necessary Hive plugin configurations for your environment [3]. You could omit "hive.metastore.sasl.enabled", "hive.server2.enable.doAs" and "hive.metastore.execute.setugi" properties, since you have specified the default values [4]. Regarding "hive.metastore.uris" and "fs.default.name" you should specify the same values for them as in your hive-site.xml.
[1] https://drill.apache.org/docs/hive-storage-plugin
[2] https://issues.apache.org/jira/browse/DRILL-6604
[3] https://drill.apache.org/docs/hive-storage-plugin/#hive-remote-metastore-configuration
[4] https://github.com/apache/hive/blob/rel/release-2.3.2/common/src/java/org/apache/hadoop/hive/conf/HiveConf.java#L824

Related

Can the Hudi Metadata table be queried?

Going through the Hudi documentation I saw the Metadata Config section and was curious about how it is used. I created a table enabling the metadata and the directory got created under /.hoodie/metadata. Has anybody experimented with this feature? Is the metadata exposed or only used internally to Hudi? What is it used for? I couldn't understand it from the docs.
I used the following Hudi options to create a table in S3 using PySpark.
hudi_options_insert = {
"hoodie.table.name": "table_p5",
"hoodie.datasource.write.table.type": "COPY_ON_WRITE",
"hoodie.datasource.write.recordkey.field": "id",
"hoodie.datasource.write.operation": "bulk_insert",
"hoodie.datasource.write.partitionpath.field": "ds",
"hoodie.datasource.write.precombine.field": "id",
"hoodie.datasource.write.hive_style_partitioning": "true",
"hoodie.datasource.hive_sync.table": "table_p5",
"hoodie.datasource.hive_sync.database": "poc_hudi",
"hoodie.datasource.hive_sync.enable": "true",
"hoodie.datasource.hive_sync.partition_fields": "ds",
"hoodie.insert.shuffle.parallelism": 6,
"hoodie.metadata.enable": "true",
"hoodie.metadata.insert.parallelism": 6
}
Thanks a mil.

How to fix: 'TypeError: Cannot initialize connector undefined:' while use loopback-connector-firestore

I tried several answers from stackoverflow but cannot solve the problem, I want to connect the firestore to loopback using this package: loopback-connector-firestore (https://www.npmjs.com/package/loopback-connector-firestore), after create the datasource using lb datasource command and start the system, error below will show out:
TypeError: Cannot initialize connector undefined: Cannot read property 'replace' of undefined
The loopback already connected to other datasource. How can I add firestore into it?
This is the datasources.json file:
{
-----other db datasources here-----
"Firestore": {
"name": "Firestore",
"projectId": "project id",
"clientEmail": "client email",
"privateKey": "key here",
"databaseName": "name here",
"connector": "loopback-connector-firestore"
}
}
In server.js file:
var ds = loopback.createDataSource({
connector: require('loopback-connector-firestore'),
provider: 'Firestore'
});
var storage = ds.createModel('storage');
app.model(storage);
The environment settings:
* Kubuntu 18.04
* nodejs v10.16
* npm v6.9
* loopback v3

I think for now the some of the connectors aren't supported by Loopback, and 'Loopback-firebase-connector' is one of them.
you may need to check the following:
Reference: https://loopback.io/doc/en/lb3/Community-connectors.html

Problem with PostgreSQL json object (PGobject) to logstash - type cast problem

I am trying to index data from a PostgreSQL database to elastic search using logstash. I have a column in the database which is of type JSON. When I run logstash, I get an error called "Missing Converter handling". I do understand that is java/logstash doesn't recognize the JSON type column.
I can typecast the column to "col_name::TEXT" which works fine. However, I need that column as JSON in an elastic search index. Is there any workaround?
Note: Keys in the JSON object is not fixed, it varies.
Logstash postgres query example:
input {
jdbc {
# Postgres jdbc connection string to our database, bot
jdbc_connection_string => "***"
# The user we wish to execute our statement as
jdbc_user => "***"
jdbc_password => "***"
# The path to our downloaded jdbc driver
jdbc_driver_library => "<ja_file_path>"
# The name of the driver class for Postgresql
jdbc_driver_class => "org.postgresql.Driver"
# our query
statement => "SELECT col_1, col_2, col_3, json_col_4 from my_db;"
}
}
filter{
}
output {
stdout { codec => json_lines }
}
Need JSON object from PostgreSQL to elastic search.

Logstash wont start when adding a match statement in a grok block

I'm having difficulty with starting Logstash.
My logstash.conf looks like this:
input {
beats {
port => "5044"
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{WORD:event_type}\t%{NUMBER:server_time}\t%{NUMBER:market_time}\t%{WORD:instrument}\t%{C_NUMBER:last_price}\t%{C_NUMBER:trade_quantity}\t%{C_NUMBER:bid_price}\t%{C_NUMBER:bid_quantity}\t%{C_NUMBER:ask_price}\t%{C_NUMBER:ask_quantity}\t%{GREEDYDATA:flags}\t%{GREEDYDATA:additional_infos}"}
}
# ... and other stuff here...
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "%{[#metadata][beat]}"
}
}
Logstash works fine if I comment the match => line. But with it, it does not start, meaning nothing shows up when I run netstat -na | grep 5044 in the container. It is simply not listening on 5044.
And when I try to run Logstash manually by /opt/logstash/bin/logstash --path.data /tmp/logstash/data -f /etc/logstash/conf.d/filebeat-config.conf, I get the following:
Sending Logstash's logs to /opt/logstash/logs which is now configured via log4j2.properties
[2018-08-27T09:35:25,883][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/tmp/logstash/data/queue"}
[2018-08-27T09:35:25,887][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/tmp/logstash/data/dead_letter_queue"}
[2018-08-27T09:35:26,177][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-08-27T09:35:26,213][INFO ][logstash.agent ] No persistent UUID file found. Generating new UUID {:uuid=>"5abcdba2-475f-46a9-b192-a343ca15ce89", :path=>"/tmp/logstash/data/uuid"}
[2018-08-27T09:35:26,727][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2018-08-27T09:35:29,016][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-08-27T09:35:29,316][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-08-27T09:35:29,325][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-08-27T09:35:29,467][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-08-27T09:35:29,510][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-08-27T09:35:29,513][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-08-27T09:35:29,533][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-08-27T09:35:29,549][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-08-27T09:35:29,565][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-08-27T09:35:29,689][ERROR][logstash.pipeline ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::FilterDelegator:0x68bd7527 #metric_events_out=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: out value:0, #metric_events_in=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: in value:0, #metric_events_time=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: duration_in_millis value:0, #id=\"e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724\", #klass=LogStash::Filters::Grok, #metric_events=#<LogStash::Instrument::NamespacedMetric:0x5867faed #metric=#<LogStash::Instrument::Metric:0x61ef1454 #collector=#<LogStash::Instrument::Collector:0x51306706 #agent=nil, #metric_store=#<LogStash::Instrument::MetricStore:0x5227344a #store=#<Concurrent::Map:0x00000000000fb4 entries=2 default_proc=nil>, #structured_lookup_mutex=#<Mutex:0x7efeb9ea>, #fast_lookup=#<Concurrent::Map:0x00000000000fb8 entries=75 default_proc=nil>>>>, #namespace_name=[:stats, :pipelines, :main, :plugins, :filters, :e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724, :events]>, #filter=<LogStash::Filters::Grok patterns_dir=>[\"./patterns\"], match=>{\"message\"=>\"%{WORD:event_type}\\\\t%{NUMBER:server_time}\\\\t%{NUMBER:market_time}\\\\t%{WORD:instrument}\\\\t%{C_NUMBER:last_price}\\\\t%{C_NUMBER:trade_quantity}\\\\t%{C_NUMBER:bid_price}\\\\t%{C_NUMBER:bid_quantity}\\\\t%{C_NUMBER:ask_price}\\\\t%{C_NUMBER:ask_quantity}\\\\t%{GREEDYDATA:flags}\\\\t%{GREEDYDATA:additional_infos}\"}, id=>\"e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724\", enable_metric=>true, periodic_flush=>false, patterns_files_glob=>\"*\", break_on_match=>true, named_captures_only=>true, keep_empty_captures=>false, tag_on_failure=>[\"_grokparsefailure\"], timeout_millis=>30000, tag_on_timeout=>\"_groktimeout\">>", :error=>"pattern %{C_NUMBER:last_price} not defined", :thread=>"#<Thread:0x20b6525c run>"}
[2018-08-27T09:35:29,699][ERROR][logstash.pipeline ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Grok::PatternError: pattern %{C_NUMBER:last_price} not defined>, :backtrace=>["/opt/logstash/vendor/bundle/jruby/2.3.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:123:in `block in compile'", "org/jruby/RubyKernel.java:1292:in `loop'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:93:in `compile'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:281:in `block in register'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:275:in `block in register'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:270:in `register'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:340:in `register_plugin'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `register_plugins'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:729:in `maybe_setup_out_plugins'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:361:in `start_workers'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `run'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:248:in `block in start'"], :thread=>"#<Thread:0x20b6525c run>"}
[2018-08-27T09:35:29,724][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
Also, next to my logstash.conf, I have the directory patterns including a file containing the following:
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
C_NUMBER (?:[+-]?(?:[(0-9)|(*,#,.)]+))
C_NUMBER2 (?:[+-]?(?:[(0-9)|(*,#,.)|null]+))
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\\.|[^\\]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
TIMESTAMP_CUSTOM %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND}.?%{NUMBER})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
What is wrong with the match => line??
I highly appreciate your help.

You're attempting to use a grok pattern, {C_NUMBER}, that Logstash doesn't know about. It doesn't appear to be a standard pattern bundled with Logstash. put NUMBER in that place, and restart logstash.

I was able to resolve the issue by changing patterns_dir => ["./patterns"] to patterns_dir => ["/etc/logstash/conf.d/patterns"].
The match line is referencing a grok pattern that Logstash didn't find because of the relative path to the patterns directory.

Using OrientDB ETL to load file of Edges

I am running OrientDB 2.1.2 from the AWS Marketplace AMI. I have already used ETL to load up two sets of vertices. Now I'm trying to load up a file of Edges into OrientDB with ETL and getting: IllegalArgumentException: destination vertex is null. I've looked at the documentation and some other examples on the net and my ETL config looks correct to me. I was hoping someone might have an idea.
My two V subclasses are:
Author (authorId, authGivenName, authSurname) and an index on authorId
Abstract (abstractId) with an index on abstractId
My E subclass
Authored - no properties or indices defined on it
My Edge file
(authorId, abstractId) - \t separated fields with one header line with those names
My ETL config:
{
"config": { "log":"debug"},
"source" : { "file": { "path":"/root/poc1_Datasets/authAbstractEdge1.tsv" }},
"extractor":{ "row":{} },
"transformers":[
{ "csv":{ "separator": "\t" } },
{ "merge": {
"joinFieldName": "authorId",
"lookup":"Author.authorId"
} },
{ "vertex":{ "class":"Author" } },
{ "edge" : {
"class": "Authored",
"joinFieldName": "abstractId",
"lookup": "Abstract.abstractId",
"direction": "out"
}}
],
"loader":{
"orientdb":{
"dbURL":"remote:localhost/DataSpine1",
"dbType":"graph",
"wal":false,
"tx":false
} }
}
When I run ETL with this config and file I get:
OrientDB etl v.2.1.2 (build #BUILD#) www.orientdb.com
BEGIN ETL PROCESSOR
[file] DEBUG Reading from file /root/poc1_Datasets/authAbstractEdge1.tsv
[0:csv] DEBUG Transformer input: authorId abstractId
[0:csv] DEBUG parsing=authorId abstractId
[0:csv] DEBUG Transformer output: null
2016-06-09 12:15:04:088 WARNI Transformer [csv] returned null, skip rest of pipeline execution [OETLPipeline][1:csv] DEBUG Transformer input: 9-s2.0-10039026700 2-s2.0-29144536313
[1:csv] DEBUG parsing=9-s2.0-10039026700 2-s2.0-29144536313
[1:csv] DEBUG document={authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:csv] DEBUG Transformer output: {authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG Transformer input: {authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG joinValue=9-s2.0-10039026700, lookupResult=Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:merge] DEBUG merged record Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2 with found record={authorId:9-s2.0-10039026700,abstractId:2-s2.0-29144536313}
[1:merge] DEBUG Transformer output: Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:vertex] DEBUG Transformer input: Author#12:10046021{authorId:9-s2.0-10039026700,authGivenName:M. A.,authSurname:Turovskaya,abstractId:2-s2.0-29144536313} v2
[1:vertex] DEBUG Transformer output: v(Author)[#12:10046021]
[1:edge] DEBUG Transformer input: v(Author)[#12:10046021]
[1:edge] DEBUG joinCurrentValue=2-s2.0-29144536313, lookupResult=Abstract#13:16626366{abstractId:2-s2.0-29144536313} v1
Error in Pipeline execution: java.lang.IllegalArgumentException: destination vertex is null
java.lang.IllegalArgumentException: destination vertex is null
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:888)
at com.tinkerpop.blueprints.impls.orient.OrientVertex.addEdge(OrientVertex.java:832)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.createEdge(OEdgeTransformer.java:188)
at com.orientechnologies.orient.etl.transformer.OEdgeTransformer.executeTransform(OEdgeTransformer.java:117)
at com.orientechnologies.orient.etl.transformer.OAbstractTransformer.transform(OAbstractTransformer.java:37)
at com.orientechnologies.orient.etl.OETLPipeline.execute(OETLPipeline.java:114)
at com.orientechnologies.orient.etl.OETLProcessor.executeSequentially(OETLProcessor.java:487)
at com.orientechnologies.orient.etl.OETLProcessor.execute(OETLProcessor.java:291)
at com.orientechnologies.orient.etl.OETLProcessor.main(OETLProcessor.java:161)
ETL process halted: com.orientechnologies.orient.etl.OETLProcessHaltedException: java.lang.IllegalArgumentException: destination vertex is null
As I look at the debug, it appears that the MERGE successfully found the Author vertex and the EDGE found the Abstract Vertex successfully (based on seeing the RIDs in the output). I'm stumped as to why I'm getting the Exception. Thanks in advance for any pointers.

Have you already tried to see if with the new etl, teleporter, Version 2.2 solves this problem?
At this link there is description about new etl product.

I actually discovered that the ETL loader in OrientDB version 2.2.2 seems to have solved this issue. (Note: version 2.2.0 still had the same issue)

Categories

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

error in accessing tables from hive using apache drill - plugins

Related

Can the Hudi Metadata table be queried?

How to fix: 'TypeError: Cannot initialize connector undefined:' while use loopback-connector-firestore

Problem with PostgreSQL json object (PGobject) to logstash - type cast problem

Logstash wont start when adding a match statement in a grok block

Using OrientDB ETL to load file of Edges

Categories

Resources