magento2 modify Elasticsearch data - magento2

I have magento2 elastic search data. I want to add some data into it. Here is result of current data of following elastic search query
GET /alpha-m2_en_catalog_product_*/_mapping
Now i want the result of this query should contain additional "fields" data as shown in below image
"fields":{
"Keyword":{
"type": "keyword"
"ignore_above" : 256
}
}
How is it possible in Magento2 to grammatically add this data in the result of
bin/magento indexer:reindex

Related

How to update data in elasticsearch with using like bulkupdate in mongoDB?

I find solution to update data in elasticsearch with golang. The data is about 1,000,000+++ documents and must be specific with id of document. I can update in mongoDB with using bulk operation but I can't find it in elasticsearch it is have a operation like it? or anyony have idea to update huge of data in elasticsearch with specific id. Thanks in advance.
In general, you can use bulk API to make such bulk updates. You can either index data again using same id or just run update. You can use CURL to push the updates from command line, if you are doing it as one off update.
POST _bulk
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "update" : {"_id" : "1", "_index" : "test"} }
{ "doc" : {"field2" : "value2"} }
Other option is to use update_by_query, if you are setting custom fields. With update by query, you can also mix it with pipeline to update existing data.
It entirely comes down whether you are trying to run update using information from different index (in such case, you can use enrich processor, which is available in 7.5 onwards) OR if you simply want to add a new field and update it using some rule which already uses attributes available on the document.
So for different type of scenario, different options are available. Bulk API is more appropriate, when the data source is external. But if data is already available on Elasticsearch, then update by query is appropriate.
You can also look at reindexing with pipeline scripting. But again, horses for courses rule applies here as well.

MongoDB / mongoose split large documents

We are extending an existing node+mongo app. We need to add what could be large docs, but we currently do not know how big they could get to.
MongoDB has a default limit to 16mb max size, i am aware we can increase this but would rather not.
Has anyone ever seen a auto doc. split module? Something to auto split the docs into partials if the size exceeds a certain size?
If you have large CSV data to be stored in MongoDB, then there are two approaches which will both work well in different ways:
1: Save in MongoDB format
This means that you have your application read the csv, and write it to a MongoDB collection one row at a time. So each row is saved as a separate document, perhaps something like this:
{
"filename" : "restaurants.csv",
"version" : "2",
"uploadDate" : ISODate("2017-06-15"),
"name" : "Ace Cafe",
"cuisine" : "British",
etc
},
{
"filename" : "restaurants.csv",
"version" : "2",
"uploadDate" : ISODate("2017-06-15"),
"name" : "Bengal Tiger",
"cuisine" : "Bangladeshi",
etc
}
This will take work on your application's part, to render the data into this format and deciding how and where to save the metadata
You can index and query on the data, field by field and row by row
You have no worries about any single document getting too large
2: Save in CSV format using GridFS
This means that your file is uploaded as an un-analysed blob, and automatically divided into 16MB chunks in order to save it in MongoDB documents.
This is easy to do, and does not disturb your original CSV structure
However the data is opaque to MongoDB: you cannot scan it or read it row by row
to work with the data, your application will have to download the entire file from MongoDB and work on it in memory
Hopefully one of these approaches will suit your needs.

Querying for specific objects in a document & Visualization - MongoDB [duplicate]

This question already has answers here:
Retrieve only the queried element in an object array in MongoDB collection
(18 answers)
Closed 5 years ago.
I have a complex geojson document stored in my MongoDB. My goal is to retrieve the objects that apply to my condition e.g:
I want to retrieve the objects that contain "avenue" in the 'features.properties.name'field. I have tried this: db.LineString.find({'features.properties.name' : "Avenue"}) which results:
As you can see, this returns the entire document. My goal is just to return the objects like the highlighted object 0 which fulfil the given condition. Also, could the results be visualized somehow?
find(arg1)
command you use returns with documents stored in collection, so it can search nested docs, but cannot return part of top level collection document. It returns all document.
If your docs structure regular try to use this
find(arg1, arg2)
instead to limit returned fields https://docs.mongodb.org/manual/tutorial/project-fields-from-query-results/
if your documents structure irregular, write a script with your favorite programming language
After much research, I found a way of actually querying specific objects and visualizing them using RoboMongo and Google Drag and Drop GeoJSON.
For the query part we can use the below technique as specified here:
//[Thematic] Finds all the LineString documents that contain “Avenue” in their name.
db.LineString.aggregate(
{ $match : {
"features.properties.name": /avenue/i
}},
{ $unwind : "$features" },
{ $match : {
"features.properties.name": /avenue/i
}}
)
Here we use the $unwind aggregation stage in order to define what field we want to retrieve. After that we use $match in order retrieve the documents with the specific conditions. Note that the first match can take advantage of an index (Text, Spatial etc..) and fasten the time by a lot. By right clicking on the document in MongoDB we can view+store the generated JSON.
However, when working with Spatial data you will probably like to have your data in the map, especially when working with complex structures. In order to do so, you will need to convert your data from JSON to GeoJSON. Below I will show the regular expression I used in order to convert my file.
Algorithm JSON file (generated from MongoDB) to GeoJSON:
Erase: "features" : \{.* AND "coordinates"[^}]*\K\} AND "type" : "Feature" AND "ok" : 1.0000000000000000 AND "id".*, **AND**"_id" : ObjectId(.*`
Replace: "type" : "FeatureCollection",[^}]*\}, WITH "type" : "Feature", AND "result" : [ WITH "features" : [
{
"type": "FeatureCollection",
"crs": { "type": "name", "properties": { "name": "urn:ogc:def:crs:OGC:1.3:CRS84" } },
"features" : [
I run these regular expressions using Notepad++:
"features" : \{.*|"coordinates"[^}]*\K\}|"type" : "Feature"|"ok" : 1.0000000000000000|"id".*,|"_id" : ObjectId\(.*,
"type" : "FeatureCollection",[^}]*\}, replace with "type" : "Feature",
Disclaimer: Please notice that your file might follow a different structure, since the way you project the data obviously effects the output file. In such a case, extra modification is required.
Now that we have the GeoJSON file we can drag and drop it to Google's GeoJSON API. The above query will give us the roads of Vancouver that contain "Avenue" on their names:
Thoughts:
I believe that this job could be done directly from RoboMongo since the produced JSON can be converted to GeoJSON with some inferior changes. Also please note that this REGEX is way too complicated and if you are interested in a more stable solution I would suggest you to use a JSON Library like NodeJS, Jackson etc and generate a brand new file.I came up with this solution and it worked perfectly for my case.
Enjoy :)

How to Query GridFS for particular text in MongoDB [duplicate]

I have a blogging system that stores uploaded files into the GridFS system. Problem is, I dont understand how to query it!
I am using Mongoose with NodeJS which doesnt yet support GridFS so I am using the actual mongodb module for the GridFS operations. There doesn't SEEM to be a way to query the files metadata like you do documents in a regular collection.
Would it be wise to store the metadata in a document pointing to the GridFS objectId? to easily be able to query?
Any help would be GREATLY appreciated, im kinda stuck :/
GridFS works by storing a number of chunks for each file. This way, you can deliver and store very large files without having to store the entire file in RAM. Also, this enables you to store files that are larger than the maximum document size. The recommended chunk size is 256kb.
The file metadata field can be used to store additional file-specific metadata, which can be more efficient than storing the metadata in a separate document. This greatly depends on your exact requirements, but the metadata field, in general, offers a lot of flexibility. Keep in mind that some of the more obvious metadata is already part of the fs.files document, by default:
> db.fs.files.findOne();
{
"_id" : ObjectId("4f9d4172b2ceac15506445e1"),
"filename" : "2e117dc7f5ba434c90be29c767426c29",
"length" : 486912,
"chunkSize" : 262144,
"uploadDate" : ISODate("2011-10-18T09:05:54.851Z"),
"md5" : "4f31970165766913fdece5417f7fa4a8",
"contentType" : "application/pdf"
}
To actually read the file from GridFS you'll have to fetch the file document from fs.files and the chunks from fs.chunks. The most efficient way to do that is to stream this to the client chunk-by-chunk, so you don't have to load the entire file in RAM. The chunks collection has the following structure:
> db.fs.chunks.findOne({}, {"data" :0});
{
"_id" : ObjectId("4e9d4172b2ceac15506445e1"),
"files_id" : ObjectId("4f9d4172b2ceac15506445e1"),
"n" : 0, // this is the 0th chunk of the file
"data" : /* loads of data */
}
If you want to use the metadata field of fs.files for your queries, make sure you understand the dot notation, e.g.
> db.fs.files.find({"metadata.OwnerId": new ObjectId("..."),
"metadata.ImageWidth" : 280});
also make sure your queries can use an index using explain().
As the specification says, you can store whatever you want in the metadata field.
Here's how a document from the files collection looks like:
Required fields
{
"_id" : <unspecified>, // unique ID for this file
"length" : data_number, // size of the file in bytes
"chunkSize" : data_number, // size of each of the chunks. Default is 256k
"uploadDate" : data_date, // date when object first stored
"md5" : data_string // result of running the "filemd5" command on this file's chunks
}
Optional fields
{
"filename" : data_string, // human name for the file
"contentType" : data_string, // valid mime type for the object
"aliases" : data_array of data_string, // optional array of alias strings
"metadata" : data_object, // anything the user wants to store
}
So store anything you want in the metadata and query it normally like you would in MongoDB:
db.fs.files.find({"metadata.some_info" : "sample"});
I know the question doesn't ask about the Java way of querying for metadata, but here it is, assuming you add gender as a metadata field:
// Get your database's GridFS
GridFS gfs = new GridFS("myDatabase);
// Write out your JSON query within JSON.parse() and cast it as a DBObject
DBObject dbObject = (DBObject) JSON.parse("{metadata: {gender: 'Male'}}");
// Querying action (find)
List<GridFSDBFile> gridFSDBFiles = gfs.find(dbObject);
// Loop through the results
for (GridFSDBFile gridFSDBFile : gridFSDBFiles) {
System.out.println(gridFSDBFile.getFilename());
}
metadata is stored in metadata field. You can query it like
db.fs.files.find({metadata: {content_type: 'text/html'}})

Querying MongoDB GridFS?

I have a blogging system that stores uploaded files into the GridFS system. Problem is, I dont understand how to query it!
I am using Mongoose with NodeJS which doesnt yet support GridFS so I am using the actual mongodb module for the GridFS operations. There doesn't SEEM to be a way to query the files metadata like you do documents in a regular collection.
Would it be wise to store the metadata in a document pointing to the GridFS objectId? to easily be able to query?
Any help would be GREATLY appreciated, im kinda stuck :/
GridFS works by storing a number of chunks for each file. This way, you can deliver and store very large files without having to store the entire file in RAM. Also, this enables you to store files that are larger than the maximum document size. The recommended chunk size is 256kb.
The file metadata field can be used to store additional file-specific metadata, which can be more efficient than storing the metadata in a separate document. This greatly depends on your exact requirements, but the metadata field, in general, offers a lot of flexibility. Keep in mind that some of the more obvious metadata is already part of the fs.files document, by default:
> db.fs.files.findOne();
{
"_id" : ObjectId("4f9d4172b2ceac15506445e1"),
"filename" : "2e117dc7f5ba434c90be29c767426c29",
"length" : 486912,
"chunkSize" : 262144,
"uploadDate" : ISODate("2011-10-18T09:05:54.851Z"),
"md5" : "4f31970165766913fdece5417f7fa4a8",
"contentType" : "application/pdf"
}
To actually read the file from GridFS you'll have to fetch the file document from fs.files and the chunks from fs.chunks. The most efficient way to do that is to stream this to the client chunk-by-chunk, so you don't have to load the entire file in RAM. The chunks collection has the following structure:
> db.fs.chunks.findOne({}, {"data" :0});
{
"_id" : ObjectId("4e9d4172b2ceac15506445e1"),
"files_id" : ObjectId("4f9d4172b2ceac15506445e1"),
"n" : 0, // this is the 0th chunk of the file
"data" : /* loads of data */
}
If you want to use the metadata field of fs.files for your queries, make sure you understand the dot notation, e.g.
> db.fs.files.find({"metadata.OwnerId": new ObjectId("..."),
"metadata.ImageWidth" : 280});
also make sure your queries can use an index using explain().
As the specification says, you can store whatever you want in the metadata field.
Here's how a document from the files collection looks like:
Required fields
{
"_id" : <unspecified>, // unique ID for this file
"length" : data_number, // size of the file in bytes
"chunkSize" : data_number, // size of each of the chunks. Default is 256k
"uploadDate" : data_date, // date when object first stored
"md5" : data_string // result of running the "filemd5" command on this file's chunks
}
Optional fields
{
"filename" : data_string, // human name for the file
"contentType" : data_string, // valid mime type for the object
"aliases" : data_array of data_string, // optional array of alias strings
"metadata" : data_object, // anything the user wants to store
}
So store anything you want in the metadata and query it normally like you would in MongoDB:
db.fs.files.find({"metadata.some_info" : "sample"});
I know the question doesn't ask about the Java way of querying for metadata, but here it is, assuming you add gender as a metadata field:
// Get your database's GridFS
GridFS gfs = new GridFS("myDatabase);
// Write out your JSON query within JSON.parse() and cast it as a DBObject
DBObject dbObject = (DBObject) JSON.parse("{metadata: {gender: 'Male'}}");
// Querying action (find)
List<GridFSDBFile> gridFSDBFiles = gfs.find(dbObject);
// Loop through the results
for (GridFSDBFile gridFSDBFile : gridFSDBFiles) {
System.out.println(gridFSDBFile.getFilename());
}
metadata is stored in metadata field. You can query it like
db.fs.files.find({metadata: {content_type: 'text/html'}})