We have MongoDB-collection which we want to import to Elasticsearch (for now as a one-off effort). For this end, we have exported the collection with monogexport. It is a huge JSON file with entries like the following:
{
"RefData" : {
"DebtInstrmAttrbts" : {
"NmnlValPerUnit" : "2000",
"IntrstRate" : {
"Fxd" : "3.1415"
},
"MtrtyDt" : "2020-01-01",
"TtlIssdNmnlAmt" : "200000000",
"DebtSnrty" : "SNDB"
},
"TradgVnRltdAttrbts" : {
"IssrReq" : "false",
"Id" : "BMTF",
"FrstTradDt" : "2019-04-01T12:34:56.789"
},
"TechAttrbts" : {
"PblctnPrd" : {
"FrDt" : "2019-04-04"
},
"RlvntCmptntAuthrty" : "GB"
},
"FinInstrmGnlAttrbts" : {
"ClssfctnTp" : "DBFNXX",
"ShrtNm" : "AVGO 3.625 10/16/24 c24 (URegS)",
"FullNm" : "AVGO 3 5/8 10/15/24 BOND",
"NtnlCcy" : "USD",
"Id" : "USU1109MAXXX",
"CmmdtyDerivInd" : "false"
},
"Issr" : "549300WV6GIDOZJTVXXX"
}
We are using the following Logstash configuration file to import this data set into Elasticsearch:
input {
file {
path => "/home/elastic/FIRDS.json"
start_position => "beginning"
sincedb_path => "/dev/null"
codec => json
}
}
filter {
mutate {
remove_field => [ "_id", "path", "host" ]
}
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "firds"
}
}
All this works fine, the data ends up in the index firds of Elasticsearch, and a GET /firds/_search returns all the entries within the _source field.
We understand that this field is not indexed and thus is not searchable, which we are actually after. We want make all of the entries within the original nested JSON searchable in Elasticsearch.
We assume that we have to adjust the filter {} part of our Logstash configuration, but how? For consistency reasons, it would not be bad to keep the original nested JSON structure, but that is not a must. Flattening would also be an option, so that e.g.
"RefData" : {
"DebtInstrmAttrbts" : {
"NmnlValPerUnit" : "2000" ...
becomes a single key-value pair "RefData.DebtInstrmAttrbts.NmnlValPerUnit" : "2000".
It would be great if we could do that immediately with Logstash, without using an additional Python script operating on the JSON file we exported from MongoDB.
EDIT: Workaround
Our current work-around is to (1) dump the MongoDB database to dump.json and then (2) flatten it with jq using the following expression, and finally (3) manually import it into Elastic
ad (2): This is the flattening step:
jq '. as $in | reduce leaf_paths as $path ({}; . + { ($path | join(".")): $in | getpath($path) }) | del(."_id.$oid") '
-c dump.json > flattened.json
References
Walker Rowe: ElasticSearch Nested Queries: How to Search for Embedded Documents
ElasticSearch search in document and in dynamic nested document
Mapping for Nested JSON document in Elasticsearch
Logstash - import nested JSON into Elasticsearch
Remark for the curious: The shown JSON is a (modified) entry from the Financial Instruments Reference Database System (FIRDS), available from the European Securities and Markets Authority (ESMA) who is an European financial regulatory agency overseeing the capital markets.
I'm having difficulty with starting Logstash.
My logstash.conf looks like this:
input {
beats {
port => "5044"
}
}
filter {
grok {
patterns_dir => ["./patterns"]
match => { "message" => "%{WORD:event_type}\t%{NUMBER:server_time}\t%{NUMBER:market_time}\t%{WORD:instrument}\t%{C_NUMBER:last_price}\t%{C_NUMBER:trade_quantity}\t%{C_NUMBER:bid_price}\t%{C_NUMBER:bid_quantity}\t%{C_NUMBER:ask_price}\t%{C_NUMBER:ask_quantity}\t%{GREEDYDATA:flags}\t%{GREEDYDATA:additional_infos}"}
}
# ... and other stuff here...
}
output {
elasticsearch {
hosts => [ "localhost:9200" ]
index => "%{[#metadata][beat]}"
}
}
Logstash works fine if I comment the match => line. But with it, it does not start, meaning nothing shows up when I run netstat -na | grep 5044 in the container. It is simply not listening on 5044.
And when I try to run Logstash manually by /opt/logstash/bin/logstash --path.data /tmp/logstash/data -f /etc/logstash/conf.d/filebeat-config.conf, I get the following:
Sending Logstash's logs to /opt/logstash/logs which is now configured via log4j2.properties
[2018-08-27T09:35:25,883][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.queue", :path=>"/tmp/logstash/data/queue"}
[2018-08-27T09:35:25,887][INFO ][logstash.setting.writabledirectory] Creating directory {:setting=>"path.dead_letter_queue", :path=>"/tmp/logstash/data/dead_letter_queue"}
[2018-08-27T09:35:26,177][WARN ][logstash.config.source.multilocal] Ignoring the 'pipelines.yml' file because modules or command line options are specified
[2018-08-27T09:35:26,213][INFO ][logstash.agent ] No persistent UUID file found. Generating new UUID {:uuid=>"5abcdba2-475f-46a9-b192-a343ca15ce89", :path=>"/tmp/logstash/data/uuid"}
[2018-08-27T09:35:26,727][INFO ][logstash.runner ] Starting Logstash {"logstash.version"=>"6.3.2"}
[2018-08-27T09:35:29,016][INFO ][logstash.pipeline ] Starting pipeline {:pipeline_id=>"main", "pipeline.workers"=>8, "pipeline.batch.size"=>125, "pipeline.batch.delay"=>50}
[2018-08-27T09:35:29,316][INFO ][logstash.outputs.elasticsearch] Elasticsearch pool URLs updated {:changes=>{:removed=>[], :added=>[http://localhost:9200/]}}
[2018-08-27T09:35:29,325][INFO ][logstash.outputs.elasticsearch] Running health check to see if an Elasticsearch connection is working {:healthcheck_url=>http://localhost:9200/, :path=>"/"}
[2018-08-27T09:35:29,467][WARN ][logstash.outputs.elasticsearch] Restored connection to ES instance {:url=>"http://localhost:9200/"}
[2018-08-27T09:35:29,510][INFO ][logstash.outputs.elasticsearch] ES Output version determined {:es_version=>6}
[2018-08-27T09:35:29,513][WARN ][logstash.outputs.elasticsearch] Detected a 6.x and above cluster: the `type` event field won't be used to determine the document _type {:es_version=>6}
[2018-08-27T09:35:29,533][INFO ][logstash.outputs.elasticsearch] New Elasticsearch output {:class=>"LogStash::Outputs::ElasticSearch", :hosts=>["//localhost:9200"]}
[2018-08-27T09:35:29,549][INFO ][logstash.outputs.elasticsearch] Using mapping template from {:path=>nil}
[2018-08-27T09:35:29,565][INFO ][logstash.outputs.elasticsearch] Attempting to install template {:manage_template=>{"template"=>"logstash-*", "version"=>60001, "settings"=>{"index.refresh_interval"=>"5s"}, "mappings"=>{"_default_"=>{"dynamic_templates"=>[{"message_field"=>{"path_match"=>"message", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false}}}, {"string_fields"=>{"match"=>"*", "match_mapping_type"=>"string", "mapping"=>{"type"=>"text", "norms"=>false, "fields"=>{"keyword"=>{"type"=>"keyword", "ignore_above"=>256}}}}}], "properties"=>{"#timestamp"=>{"type"=>"date"}, "#version"=>{"type"=>"keyword"}, "geoip"=>{"dynamic"=>true, "properties"=>{"ip"=>{"type"=>"ip"}, "location"=>{"type"=>"geo_point"}, "latitude"=>{"type"=>"half_float"}, "longitude"=>{"type"=>"half_float"}}}}}}}}
[2018-08-27T09:35:29,689][ERROR][logstash.pipeline ] Error registering plugin {:pipeline_id=>"main", :plugin=>"#<LogStash::FilterDelegator:0x68bd7527 #metric_events_out=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: out value:0, #metric_events_in=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: in value:0, #metric_events_time=org.jruby.proxy.org.logstash.instrument.metrics.counter.LongCounter$Proxy2 - name: duration_in_millis value:0, #id=\"e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724\", #klass=LogStash::Filters::Grok, #metric_events=#<LogStash::Instrument::NamespacedMetric:0x5867faed #metric=#<LogStash::Instrument::Metric:0x61ef1454 #collector=#<LogStash::Instrument::Collector:0x51306706 #agent=nil, #metric_store=#<LogStash::Instrument::MetricStore:0x5227344a #store=#<Concurrent::Map:0x00000000000fb4 entries=2 default_proc=nil>, #structured_lookup_mutex=#<Mutex:0x7efeb9ea>, #fast_lookup=#<Concurrent::Map:0x00000000000fb8 entries=75 default_proc=nil>>>>, #namespace_name=[:stats, :pipelines, :main, :plugins, :filters, :e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724, :events]>, #filter=<LogStash::Filters::Grok patterns_dir=>[\"./patterns\"], match=>{\"message\"=>\"%{WORD:event_type}\\\\t%{NUMBER:server_time}\\\\t%{NUMBER:market_time}\\\\t%{WORD:instrument}\\\\t%{C_NUMBER:last_price}\\\\t%{C_NUMBER:trade_quantity}\\\\t%{C_NUMBER:bid_price}\\\\t%{C_NUMBER:bid_quantity}\\\\t%{C_NUMBER:ask_price}\\\\t%{C_NUMBER:ask_quantity}\\\\t%{GREEDYDATA:flags}\\\\t%{GREEDYDATA:additional_infos}\"}, id=>\"e473071da674c7efab2a8ee71c9e682afff58b8a4725d076964bc668f3b2c724\", enable_metric=>true, periodic_flush=>false, patterns_files_glob=>\"*\", break_on_match=>true, named_captures_only=>true, keep_empty_captures=>false, tag_on_failure=>[\"_grokparsefailure\"], timeout_millis=>30000, tag_on_timeout=>\"_groktimeout\">>", :error=>"pattern %{C_NUMBER:last_price} not defined", :thread=>"#<Thread:0x20b6525c run>"}
[2018-08-27T09:35:29,699][ERROR][logstash.pipeline ] Pipeline aborted due to error {:pipeline_id=>"main", :exception=>#<Grok::PatternError: pattern %{C_NUMBER:last_price} not defined>, :backtrace=>["/opt/logstash/vendor/bundle/jruby/2.3.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:123:in `block in compile'", "org/jruby/RubyKernel.java:1292:in `loop'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/jls-grok-0.11.5/lib/grok-pure.rb:93:in `compile'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:281:in `block in register'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:275:in `block in register'", "org/jruby/RubyHash.java:1343:in `each'", "/opt/logstash/vendor/bundle/jruby/2.3.0/gems/logstash-filter-grok-4.0.3/lib/logstash/filters/grok.rb:270:in `register'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:340:in `register_plugin'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `block in register_plugins'", "org/jruby/RubyArray.java:1734:in `each'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:351:in `register_plugins'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:729:in `maybe_setup_out_plugins'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:361:in `start_workers'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:288:in `run'", "/opt/logstash/logstash-core/lib/logstash/pipeline.rb:248:in `block in start'"], :thread=>"#<Thread:0x20b6525c run>"}
[2018-08-27T09:35:29,724][ERROR][logstash.agent ] Failed to execute action {:id=>:main, :action_type=>LogStash::ConvergeResult::FailedAction, :message=>"Could not execute action: PipelineAction::Create<main>, action_result: false", :backtrace=>nil}
Also, next to my logstash.conf, I have the directory patterns including a file containing the following:
USERNAME [a-zA-Z0-9._-]+
USER %{USERNAME}
INT (?:[+-]?(?:[0-9]+))
BASE10NUM (?<![0-9.+-])(?>[+-]?(?:(?:[0-9]+(?:\.[0-9]+)?)|(?:\.[0-9]+)))
NUMBER (?:%{BASE10NUM})
C_NUMBER (?:[+-]?(?:[(0-9)|(*,#,.)]+))
C_NUMBER2 (?:[+-]?(?:[(0-9)|(*,#,.)|null]+))
BASE16NUM (?<![0-9A-Fa-f])(?:[+-]?(?:0x)?(?:[0-9A-Fa-f]+))
BASE16FLOAT \b(?<![0-9A-Fa-f.])(?:[+-]?(?:0x)?(?:(?:[0-9A-Fa-f]+(?:\.[0-9A-Fa-f]*)?)|(?:\.[0-9A-Fa-f]+)))\b
POSINT \b(?:[1-9][0-9]*)\b
NONNEGINT \b(?:[0-9]+)\b
WORD \b\w+\b
NOTSPACE \S+
SPACE \s*
DATA .*?
GREEDYDATA .*
QUOTEDSTRING (?>(?<!\\)(?>"(?>\\.|[^\\"]+)+"|""|(?>'(?>\\.|[^\\']+)+')|''|(?>(?>\\.|[^\\]+)+`)|``))
UUID [A-Fa-f0-9]{8}-(?:[A-Fa-f0-9]{4}-){3}[A-Fa-f0-9]{12}
MAC (?:%{CISCOMAC}|%{WINDOWSMAC}|%{COMMONMAC})
CISCOMAC (?:(?:[A-Fa-f0-9]{4}\.){2}[A-Fa-f0-9]{4})
WINDOWSMAC (?:(?:[A-Fa-f0-9]{2}-){5}[A-Fa-f0-9]{2})
COMMONMAC (?:(?:[A-Fa-f0-9]{2}:){5}[A-Fa-f0-9]{2})
MONTH \b(?:Jan(?:uary)?|Feb(?:ruary)?|Mar(?:ch)?|Apr(?:il)?|May|Jun(?:e)?|Jul(?:y)?|Aug(?:ust)?|Sep(?:tember)?|Oct(?:ober)?|Nov(?:ember)?|Dec(?:ember)?)\b
MONTHNUM (?:0?[1-9]|1[0-2])
MONTHDAY (?:(?:0[1-9])|(?:[12][0-9])|(?:3[01])|[1-9])
DAY (?:Mon(?:day)?|Tue(?:sday)?|Wed(?:nesday)?|Thu(?:rsday)?|Fri(?:day)?|Sat(?:urday)?|Sun(?:day)?)
YEAR (?>\d\d){1,2}
HOUR (?:2[0123]|[01]?[0-9])
MINUTE (?:[0-5][0-9])
SECOND (?:(?:[0-5][0-9]|60)(?:[:.,][0-9]+)?)
TIME (?!<[0-9])%{HOUR}:%{MINUTE}(?::%{SECOND})(?![0-9])
DATE_US %{MONTHNUM}[/-]%{MONTHDAY}[/-]%{YEAR}
DATE_EU %{MONTHDAY}[./-]%{MONTHNUM}[./-]%{YEAR}
ISO8601_TIMEZONE (?:Z|[+-]%{HOUR}(?::?%{MINUTE}))
ISO8601_SECOND (?:%{SECOND}|60)
TIMESTAMP_ISO8601 %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND})?%{ISO8601_TIMEZONE}?
TIMESTAMP_CUSTOM %{YEAR}-%{MONTHNUM}-%{MONTHDAY}[T ]%{HOUR}:?%{MINUTE}(?::?%{SECOND}.?%{NUMBER})?%{ISO8601_TIMEZONE}?
DATE %{DATE_US}|%{DATE_EU}
DATESTAMP %{DATE}[- ]%{TIME}
TZ (?:[PMCE][SD]T|UTC)
DATESTAMP_RFC822 %{DAY} %{MONTH} %{MONTHDAY} %{YEAR} %{TIME} %{TZ}
DATESTAMP_OTHER %{DAY} %{MONTH} %{MONTHDAY} %{TIME} %{TZ} %{YEAR}
What is wrong with the match => line??
I highly appreciate your help.
You're attempting to use a grok pattern, {C_NUMBER}, that Logstash doesn't know about. It doesn't appear to be a standard pattern bundled with Logstash. put NUMBER in that place, and restart logstash.
I was able to resolve the issue by changing patterns_dir => ["./patterns"] to patterns_dir => ["/etc/logstash/conf.d/patterns"].
The match line is referencing a grok pattern that Logstash didn't find because of the relative path to the patterns directory.
I am unable to push data to mongo Db using logstash
My config file looks like:-
input {
file {
type => "log"
path => "d:\logs\*.txt"
}
}
output {
mongodb {
database => "abhi1"
collection => "plain"
uri => "mongodb://127.0.0.1:27017"
}
}
command used for executing configuration file is logstash -f ./conf/demo.conf
ERROR :-
[2015-09-08T16:26:04.883000 #4528] DEBUG -- : MONGODB | COMMAND | namespace=a
in.$cmd selector={:ismaster=>1} flags=[] limit=-1 skip=0 project=nil | runtime
46.9999ms
hoping to get a workaround soon. thanks
I have a db with a bunch of collections.
Some of them have 'status' and some don't.
How can I insert 'status':'pending' into the collections that lack 'status' - but not overwrite the collections that already have a status?
Using pymongo / flask / python 2.7
I tried this:
orders = monDB.find('order')
for order in orders:
if not order['status']:
monDB.update('order', {'status':'pending'})
print 'success'
But nothing happens. What am I doing wrong?
Use $exists to check if the field exists and $set to create it if it doesn't. Assuming monDB is your collection:
monDB.update({'status': {'$exists' : False}}, {'$set' : {'status': 'pending'}}, multi=True)
In the mongo shell:
> db.myQueue.update({status: {$exists: false}}, {$set : {status: 'pending'}}, false, true)
See http://docs.mongodb.org/manual/reference/operators/ and http://docs.mongodb.org/manual/applications/update/#crud-update-update.
I have thought fastmod specifies some operations like update-in-place.
In my app I'm doing update by _id using '$' modifiers, for example:
$colleciton->update(
array('_id' => $id),
array(
'$inc' => array('hits' => new MongoInt32(1)),
'$set' => array(
'times.gen' => gettimeofday(true),
'http.code' => new MongoInt32(200)
)
),
array('safe'=>false,'multiple'=>false,'upsert'=>false)
);
I've got such logs:
Wed Jul 25 11:08:36 [conn7002912] update mob.stat_pages query: { _id: BinData } update: { $inc: { hits: 1 }, $set: { times.gen: 1343203715.684896, http.code: 200 } } nscanned:1 nupdated:1 keyUpdates:0 locks(micros) w:342973 342ms
In logs as you can see I don't have any "fastmod" flags. There is no "moved" flag, because I set fields 'times.gen' and 'http.code' on insert, so padding factor is 1.0.
Am I doing something wrong, or I misunderstood meaning of fastmod?
You are correct that "fastmod" in the logs means an in-place update. Some possible reasons for the omission of logged fastmod/in-place operations:
You are actually setting or incrementing a field that doesn't exist, so it must be added, not an in place operation
The logs only show slow queries (default >100ms), so the in-place ones are probably happening too fast to be logged
You seem to be using 2.1 or 2.2 judging by the log - did the messages disappear if/when you switched to the new version?
In terms of looking into this further:
Have a look at the profiler, try with different settings, note: profiling adds load - so use carefully.
You can also try setting the slowms value lower, either on start up or:
> db.setProfilingLevel(0,20) // slow threshold=20ms