select mongodb aray element in a single field in pentaho PDI - mongodb

Following is the structure of the document i have in a collection in MongoDB
{
"_id": {
"$oid": "5f48e358d43721376c397f53"
},
"heading": "this is heading",
"tags": ["tag1","tag2","tag3"],
"categories": ["projA", "projectA2"],
"content": ["This", "is", "the", "content", "of", "the", "document"],
"timestamp": 1598612312.506219,
"lang": "en"
}
When i am importing data in PDI using the mongodb input step the system is putting each array of the "content" element in a different field
I want to select each element in one field (concat the array type elements). for example in the attached image
above i want one content field with all the arrays concatenated
How do i do that?

Related

Generate a JSON schema from an existing MongoDB collection

I have a MongoDB collection that contains a lot of documents. They are all roughly in the same format, though some of them are missing some properties while others are missing other properties. So for example:
[
{
"_id": "SKU14221",
"title": "Some Product",
"description": "Product Description",
"salesPrice": 19.99,
"specialPrice": 17.99,
"marketPrice": 22.99,
"puchasePrice": 12,
"currency": "USD",
"color": "red",
},
{
"_id": "SKU14222",
"title": "Another Product",
"description": "Product Description",
"salesPrice": 29.99,
"currency": "USD",
"size": "40",
}
]
I would like to automatically generate a schema from the collection. Ideally it would not which properties are present in all the documents and mark those as required. Detecting unique columns would also be nice, though not really all that necessary. In any event I would be modifying the schema after it's automatically generated.
I've noticed that there are tools that can do this for JSON. But short of downloading the entire collection as JSON, is it possible to do this using the MongoDb console or a CLI tool directly from the collection?
You could try this tool out. It appears to do exactly what you want.
Extract (and visualize) schema from Mongo database, including foreign
keys. Output is simple json file or html with dagre/d3.js diagram
(depending on command line options).
https://www.npmjs.com/package/extract-mongo-schema

MongoDB text search across two collections

I have an Order collection with address fields and User collection with names. The Order collection contains a string called userId, which is a "foreign key" into the users collection.
I am using an aggregation pipeline to filter, join, sort, and paginate queries. The problem is that I need to provide full text search on the address and name fields.
Because the $text match must be the first stage in a pipeline, I am not sure how to accomplish the goal of finding text matching any address or name field.
User collection
[{
"_id": "5cb8caa069fc1a4351cc3705",
"firstName": "James",
"lastName": "Bond"
},{
"_id": "5c58b8de8596d52c248f34d5",
"firstName": "Jack",
"lastName": "Ryan"
}]
Order Collection
[{
"_id": "5ccc94602e67ca44fe69f160",
"address": {
"streetAddress1": "1112 main st",
"streetAddress2": null,
"unitNumber": "unit 1112",
"city": "Jackson Hole",
"state": "WY",
"postalCode": "83001"
},
"userId": "5cb8caa069fc1a4351cc3705"
}]
A search for "Jack" should match both the name "Jack" and the city "Jackson Hole".

Is there any ways in mongodb acts like SQL "drop column"?

I have some mongodb documents which structure like:
{
"_id": ObjectId("58c212b06ca3472b902f9fdb"),
"Auction name": "Building",
"Estimated price": "23,660,000",
"Auction result": "success",
"Url": "https://someurl.htm",
"match_id": "someid",
"Final price": "17,750,000",
"Area": [
{
"Area": "696.77"
}
]
}
The "match_id" is used for update query and after that I don't need this entry anymore.
Is there any idea to drop this entry and keep the rest of the document?
Have you tried simpily using an update query to unset the field like the following
db.products.update(
{},
{ $unset: { match_id: "" } }
)
Keep in mind that the first set of curly braces has been intentionally left blank so that your update query matches every entry in your collection

How to update a property of a sub-document in an embedded array?

Given the following document in the database, I want to update pincode of address array.
I'm using the $ positional locator in Mongodb. But this does not find the document embedded multiple levels.
"_id": ObjectId("58b91ccf3dc9021191b256ff"),
"phone": 9899565656,
"Email": "sumit#mail.com",
"Organization": "xyz",
"Name": "sumit",
"address": [{
"city": "chennai",
"pincode": 91,
"_id": ObjectId("58b91db48682ab11ede79b28"),
"choice": [{
"_id": ObjectId("58b91fa6901a74124fd70d89")
}]
}]
Using this query to update.
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},{$set:{"address.$.pincode.": 95 }})
You seem to have incorrect field name while updating, an extra dot at the end. Try following
db.presenters.update({"Email":"sumit#mail.com","address.city":"chennai"},
{$set:{"address.$.pincode": 95 }})

Project nestsed array elements in MongoDb

I have a collection with documents like this
"_id": ObjectId('55f02a779e6efb8'),
"msgId": "5fdf509c-5229-4e7c-87ff",
"statuses": [
{
"state": "QUEUED",
"timestamp": ISODate('2013-10-08T13:13:38.000Z')
},
{
"state": "PENDING",
"timestamp": ISODate('2013-10-08T13:13:49.000Z')
},
{
"state": "DELIVERED",
"timestamp": ISODate('2013-10-08T13:13:57.000Z')
}
]
I want to use project for assigning the last 2 (embedded docs) values of the nested array (the array size is not static) so as to use them in a group operation in later step.
I want sth like $slice(aggregation) but it is still not supported in the version of MongoDB I use (3.0.6).
Is there any way to access the 2 last elements of the array by their index and if not is there any other solution?