ElasticSearch with MongoDB doesn't indexing big objects - mongodb

I created ES (with MongoDB river plugin) index with folowing information:
{
"type": "mongodb",
"mongodb": {
"db": "mydatabase",
"collection": "Users"
},
"index": {
"name": "users",
"type": "user"
}
}
When I insert simple object like:
{
"name": "Joe",
"surname": "Black"
}
Everything work without problem (I can see data using ES Head web interface).
But when I insert bigger object, it doesn't index it:
{
"object": {
"text": "Let's do it again!",
"boolTest": false
},
"type": "coolType",
"tags": [
""
],
"subObject1": {
"count": 0,
"last3": [],
"array": []
},
"subObject2": {
"count": 0,
"last3": [],
"array": []
},
"subObject3": {
"count": 0,
"last3": [],
"array": []
},
"usrID": "5141a5a4d8f3a79c09000001",
"created": Date(1363527664000),
"lastUpdate": Date(1363527664000)
}
Where can be problem please?
Thank you for your help!
EDIT: This is error from ES console:
org.elasticsearch.index.mapper.MapperParsingException: object mapping
for [stream] tried to parse as object, but got EOF, has a concrete
value been provided to it? at
org.elasticsearch.index.mapper.object.ObjectMapper.parse(ObjectMapper.java:457)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:486)
at
org.elasticsearch.index.mapper.DocumentMapper.parse(DocumentMapper.java:430)
at
org.elasticsearch.index.shard.service.InternalIndexShard.prepareIndex(InternalIndexShard.java:318)
at
org.elasticsearch.action.bulk.TransportShardBulkAction.shardOperationOnPrimary(TransportShardBulkAction.java:157)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction.performOnPrimary(TransportShardReplicationOperationAction.java:533)
at
org.elasticsearch.action.support.replication.TransportShardReplicationOperationAction$AsyncShardOperationAction$1.run(TransportShardReplicationOperationAction.java:431)
at
java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)
at
java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)
at java.lang.Thread.run(Thread.java:722) [2013-03-20
10:35:05,697][WARN
][org.elasticsearch.river.mongodb.MongoDBRiver$Indexer] failed to
executefailure in bulk execution: [0]: index [stream], type [stream],
id [514982c9b7f3bfbdb488ca81], message [MapperParsingException[object
mapping for [stream] tried to parse as object, but got EOF, has a
concrete value been provided to it?]] [2013-03-20 10:35:05,698][INFO
][org.elasticsearch.river.mongodb.MongoDBRiver$Indexer] Indexed 1
documents, 1 insertions 0, updates, 0 deletions, 0 documents per
second

Which version of MongoDB river are you using?
Please look at issue #26 [1]. It contains examples on indexing large json documents with no issue.
If you can still reproduce the issue please provide more details: river settings, mongodb (version, specific settings), elasticsearch (version, specific settings).
https://github.com/richardwilly98/elasticsearch-river-mongodb/issues/26

Related

MongoDB updateMany with $set does not work

I have a simple sample database which I use to develop a simple micro framework for our application. To verify versioning works as expected, I need to update all documents by adding a new field. I used this thread as guide, so what I'm doing should work.
{
"_id": {
"$oid": "63d95015f94d9a88ecbc1a00"
},
"Version": 1,
"bookTitle": "Fancy Book 0",
"author": "Author 0"
}
{
"_id": {
"$oid": "63d95015f94d9a88ecbc1a01"
},
"Version": 1,
"bookTitle": "Fancy Book 1",
"author": "Author 1"
}
... 8 more
Now when I run this in MongoSH:
db.Books.updateMany({}, {$set: {'NewField': true}})
I get this output:
{
acknowledged: true,
insertedId: null,
matchedCount: 0,
modifiedCount: 0,
upsertedCount: 0
}
So what's the issue here? It does execute and it seemingly is correct, but no single document updates. The official documentation states that {} is a selector for all documents, so why does it not match even one of them?

Why it needs to convert `{"$numberInt": "1"}` to `1` before restoring data with `mongorestore`?

I have some dumped BSON and JSON files from a MongoDB server running on Google Cloud Platform(GCP) and I want to restore the data into a new local server with version 4.0.3. However, I got errors showing that the indices cannot be restored. I had to convert {"$numberInt": "1"} in the JSON files to 1 to make the restoring process success. Why I need to take effort to fix the format of the dumped files. Is it due to the different versions between the source server and the target server or due to some things I did not do correctly?
I have googled and searched stack overflow, but I did not see any one discussed this problem. And the release note of MongoDB does not mention any changes related to this problem.
Here is the JSON example cannot accept by mongorestore with version 4.0.3
{
"options": {},
"indexes": [
{
"v": {
"$numberInt": "2"
},
"key": {
"_id": {
"$numberInt": "1"
}
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": {
"$numberInt": "2"
},
"key": {
"itemId": {
"$numberDouble": "1.0"
}
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And this is the accepted one which is converted on my own.
{
"options": {},
"indexes": [
{
"v": 2,
"key": {
"_id": 1
},
"name": "_id_",
"ns": "demo.item"
},
{
"v": 2,
"key": {
"itemId": 1.0
},
"name": "itemId_1",
"ns": "demo.item"
}
],
"uuid": "8ce4755612da4d048b0fd38a793f2b55"
}
And here is the script I use to do the conversion.
Questions:
Why mongorestore does not accept the dumped file created by mongodump?
Is there any method for avoiding from modifying the dumped files manually?
You need to use mongorestore version 4.2+ that supports Extended JSON v2.0 (Canonical mode or Relaxed) format. See reference here.

Mongodb query for array property ne null not working

I'm trying to execute a query like:
{array.0.property: {$ne: null}}.
It return nothing even if all documents have this property different from null.
After some tests i noticed that it work using $elemMatch, but i need to query only for the first element of the array.
The first element is to be considered as "Master" where all query should search.
I can't change document "schema".
Anyone know ho to solve this problem?
I'm using Mongodb 3.6.8.
Thanks in advice.
Example query:
db.getCollection('tasks').find({'details.0.code': {$ne: null}});
Example documents:
{
"name": "test",
"date": 2018-07-17 06:30:00.000Z,
.....,
"details": [
{
"code": '123',
"description": 'something',
"resolutionYear": 2018
},
{
"code": null,
"description": 'secondary',
"resolutionYear": 2019
}
]
},
{
"name": "exam",
"date": 2018-09-20 09:00:00.000Z,
.....,
"details": [
{
"code": null,
"description": 'exam',
"resolutionYear": null
}
]
}

MongoDB - Project specific element from array (big data)

I got a big array with data in the following format:
{
"application": "myapp",
"buildSystem": {
"counter": 2361.1,
"hostname": "host.com",
"jobName": "job_name",
"label": "2361",
"systemType": "sys"
},
"creationTime": 1517420374748,
"id": "123",
"stack": "OTHER",
"testStatus": "PASSED",
"testSuites": [
{
"errors": 0,
"failures": 0,
"hostname": "some_host",
"properties": [
{
"name": "some_name",
"value": "UnicodeLittle"
},
<MANY MORE PROPERTIES>,
{
"name": "sun",
"value": ""
}
],
"skipped": 0,
"systemError": "",
"systemOut": "",
"testCases": [
{
"classname": "IdTest",
"name": "has correct representation",
"status": "PASSED",
"time": "0.001"
},
<MANY MORE TEST CASES>,
{
"classname": "IdTest",
"name": "normalized values",
"status": "PASSED",
"time": "0.001"
}
],
"tests": 8,
"time": 0.005,
"timestamp": "2018-01-31T17:35:15",
"title": "IdTest"
}
<MANY MORE TEST SUITES >,
]}
Where I can distinct three main structures with big data: TestSuites, Properties, and TestCases. My task is to sum all times from each TestSuite so that I can get the total duration of the test. Since the properties and TestCases are huge, the query cannot complete. I would like to select only the "time" value from TestSuites, but it kind of conflicts with the "time" of TestCases in my query:
db.my_tests.find(
{
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
},
{
application: 1,
creationTime: 1,
buildSystem: 1,
"testSuites.time": 1,
_id:1
}
)
Is it possible to project only the "time" properties from TestSuites without loading the whole schema? I already tried testSuites: 1, testSuites.$.time: 1 without success. Please notice that TestSuites is an array of one element with a dictionary.
I already checked this similar post without success:
Mongodb update the specific element from subarray
Following code prints duration of each TestSuite:
query = db.my_collection.aggregate(
[
{$match: {
application: application,
creationTime:{
$gte: start_date.valueOf(),
$lte: end_date.valueOf()
}
}
},
{ $project :
{ duration: { $sum: "$testSuites.time"}}
}
]
).forEach(function(doc)
{
print(doc._id)
print(doc.duration)
}
)
Is it possible to project only the "time" properties from TestSuites
without loading the whole schema? I already tried testSuites: 1,
testSuites.$.time
Answering to your problem of prejecting only the time property of the testSuites document you can simply try projecting it with "testSuites.time" : 1 (you need to add the quotes for the dot notation property references).
My task is to sum all times from each TestSuite so that I can get the
total duration of the test. Since the properties and TestCases are
huge, the query cannot complete
As for your task, i suggest you try out the mongodb's aggregation framework for your calculations documents tranformations. The aggregations framework option {allowDiskUse : true} will also help you if you are proccessing "large" documents.

Meteor MongoDB - cant insert _id with _id from API call

I am trying to call the TwitchAPI and insert some of the returned data into MongoDB. However, every time I get this error: Error: Meteor requires document _id fields to be non-empty strings or ObjectIDs.
The Twitch API response for a single stream/channel looks like this:
{
"streams": [
{
"_id": 11220687552,
"game": "League of Legends",
"viewers": 11661,
"created_at": "2014-09-30T01:10:36Z",
"_links": {
"self": "http://api.twitch.tv/kraken/streams/mushisgosu"
},
"preview": {
"small": "http://static-cdn.jtvnw.net/previews-ttv/live_user_mushisgosu-80x50.jpg",
"medium": "http://static-cdn.jtvnw.net/previews-ttv/live_user_mushisgosu-320x200.jpg",
"large": "http://static-cdn.jtvnw.net/previews-ttv/live_user_mushisgosu-640x400.jpg",
"template": "http://static-cdn.jtvnw.net/previews-ttv/live_user_mushisgosu-{width}x{height}.jpg"
},
"channel": {
"_links": {
"self": "https://api.twitch.tv/kraken/channels/mushisgosu",
"follows": "https://api.twitch.tv/kraken/channels/mushisgosu/follows",
"commercial": "https://api.twitch.tv/kraken/channels/mushisgosu/commercial",
"stream_key": "https://api.twitch.tv/kraken/channels/mushisgosu/stream_key",
"chat": "https://api.twitch.tv/kraken/chat/mushisgosu",
"features": "https://api.twitch.tv/kraken/channels/mushisgosu/features",
"subscriptions": "https://api.twitch.tv/kraken/channels/mushisgosu/subscriptions",
"editors": "https://api.twitch.tv/kraken/channels/mushisgosu/editors",
"videos": "https://api.twitch.tv/kraken/channels/mushisgosu/videos",
"teams": "https://api.twitch.tv/kraken/channels/mushisgosu/teams"
},
"background": null,
"banner": "http://static-cdn.jtvnw.net/jtv_user_pictures/mushisgosu-channel_header_image-c5c08cce281b7be3-640x125.jpeg",
"display_name": "MushIsGosu",
"game": "League of Legends",
"logo": "http://static-cdn.jtvnw.net/jtv_user_pictures/mushisgosu-profile_image-b1c8bb5fd700025e-300x300.png",
"mature": false,
"status": "CLG hi im Gosu - Challenger AD - Smurfing Master!",
"partner": true,
"url": "http://www.twitch.tv/mushisgosu",
"video_banner": "http://static-cdn.jtvnw.net/jtv_user_pictures/mushisgosu-channel_offline_image-7e3401b20cb5d739-640x360.png",
"_id": 41939266,
"name": "mushisgosu",
"created_at": "2013-03-31T21:12:14Z",
"updated_at": "2014-09-30T03:08:55Z",
"abuse_reported": null,
"delay": 60,
"followers": 318914,
"profile_banner": null,
"profile_banner_background_color": null,
"views": 25963780,
"language": "en-us"
}
}
],
"_total": 8477,
"_links": {
"self": "https://api.twitch.tv/kraken/streams?limit=1&offset=0",
"next": "https://api.twitch.tv/kraken/streams?limit=1&offset=1",
"featured": "https://api.twitch.tv/kraken/streams/featured",
"summary": "https://api.twitch.tv/kraken/streams/summary",
"followed": "https://api.twitch.tv/kraken/streams/followed"
}
}
The part of my server method that tries to insert the data
Meteor.call('getStreams', function(err, res) {
var data = res.data.streams;
console.log(data);
data.forEach(function(item) {
console.log(item._id);
Streams.insert({
_id: item._id,
title: item.channel.status,
author: item.channel.display_name,
url: item.url
});
});
});
getStreams simple defines the url to call and sets some variable. As you can see I am console logging the expected _id so I know it is returning a valid string but I am still getting the error. Currently, when I make the call I return 100 streams at a time and iterate through them to save the 4 fields above. Ideally, I would like to save each stream object as its own entry in the DB but all my attempts to do that have resulted in the same error and I also read somewhere that the version on "miniMongo" bundled with Meteor does not support inserting an array of objects in bulk...I also have read that miniMong does not support Collection.save() so sadly I think it will be more later to update the contents of each _id with the latest API call info since I cant just use .save to update and insert in the same statement.
I am not sure if it has any impact but I did try setting autoIndexId to false when creating the collection and it doesn't seem to matter:
Streams = new Meteor.Collection('streams', {autoIndexId: false});
Any insight is appreciated.
The problem is that the twitch _id is NOT a String, it appears to be a Number (I can tell by the output of your JSON : the number is not surrounded by quotes).
What I'd do is let Meteor generate its own internal Mongo IDs and store the twitch _id as a separate property instead.
Streams.insert({
twitchId: item._id,
title: item.channel.status,
author: item.channel.display_name,
url: item.url
});
You will have to retrieve the streams by twitchId instead of _id, but it's hardly a problem, right ?