Elasticsearch for normalized data - facebook

I have some data from facebook api's...
I have a FB page and thats part of multiple country...
For example:-
Assume company-x operated in multiple countries - USA, UK, India, China
Now, a page can be posted on multiple country pages.
For example:- Company-x new innovation will be displayed in all the 4 country pages...
Each of the pages will get its over comments, likes...etc...
So, basically its a relational data.
Company(1) - Country(n)- Post(n) - LIkes(n) - Comments(n)...
I would like to know what would be the best way to store this data in elastic search and implement the search engine..

As you can't use "classic" (relational) JOINs in Elasticsearch, IMHO you only can choose between storing the (sub-)objects as flat objects, parent-child objects or nested objects in the index/type.
I think that you should consider the first two option. I personally would opt for flat objects, as they are easier to load, and also get returned from the FB Graph API in that way ("flat"). What you would have to add in you application is the mapping of the page to Company -> Country, because FB doesn't know about that.
See
https://www.elastic.co/guide/en/elasticsearch/guide/current/nested-mapping.html
https://www.elastic.co/guide/en/elasticsearch/guide/current/parent-child-mapping.html
As a query for the posts, you could use something like
/?ids={page1_id},{page2_id},{page3_id}&fields=id,posts.fields(id,message,created_time,link,picture,place,status_type,shares,likes.summary(true).limit(0),comments.summary(true).limit(0))
which will return something like
{
"id": "7419689078",
"posts": {
"data": [
{
"id": "7419689078_10153348181604079",
"message": "Gotta find them all in the real world soon.",
"created_time": "2015-09-10T06:40:12+0000",
"link": "http://venturebeat.com/2015/09/09/nintendo-takes-pokemon-into-mobile-gaming-in-partnership-with-google-niantic/",
"picture": "https://fbexternal-a.akamaihd.net/safe_image.php?d=AQDvvzpCAM1WkJZS&w=130&h=130&url=http%3A%2F%2Fi0.wp.com%2Fventurebeat.com%2Fwp-content%2Fuploads%2F2013%2F04%2Fpokemon_mystery_dungeon_gates_to_infinity_art.jpg%3Ffit%3D780%252C9999&cfs=1",
"status_type": "shared_story",
"likes": {
"data": [
],
"summary": {
"total_count": 0,
"can_like": true,
"has_liked": false
}
},
"comments": {
"data": [
],
"summary": {
"order": "ranked",
"total_count": 0,
"can_comment": true
}
}
}
],
"paging": {
"previous": "https://graph.facebook.com/v2.4/7419689078/posts?fields=id,message,created_time,link,picture,place,status_type,shares,likes.summary%28true%29.limit%280%29,comments.summary%28true%29.limit%280%29&limit=1&since=1441867212&access_token=&__paging_token=&__previous=1",
"next": "https://graph.facebook.com/v2.4/7419689078/posts?fields=id,message,created_time,link,picture,place,status_type,shares,likes.summary%28true%29.limit%280%29,comments.summary%28true%29.limit%280%29&limit=1&access_token=&until=1441867212&__paging_token="
}
}
}
You can then use some application-side JSON manipulation to
Add the Company -> Country -> Page mapping info to the JSON
Get rid of unwanted fields such as paging
Flatten the structure before saving (e.g. posts.data as posts)
before you save it to Elasticsearch. See the JSFiddle I prepared (fill in the access token!):
http://jsfiddle.net/7x9xuo8L/
Then, you can use the bulk load feature to load the data to Elasticsearch:
https://www.elastic.co/guide/en/elasticsearch/reference/current/docs-bulk.html
Sample JavaScript code:
var pageMapping = {
"venturebeat": {
"country": "United States",
"company": "Venture Beat"
},
"techcrunch": {
"country": "United States",
"company": "TechCrunch"
}
};
//For bulk load
var esInfo = {
"index": "socialmedia",
"type": "fbosts"
};
var accessToken = "!!!FILL_IN_HERE_BEFORE_EXECUTING!!!";
var requestUrl = "https://graph.facebook.com/?ids=venturebeat,techcrunch&fields=id,name,posts.fields(id,message,created_time,link,picture,place,status_type,shares,likes.summary(true).limit(0),comments.summary(true).limit(0)).limit(2)&access_token=" + accessToken;
$.getJSON(requestUrl, function(fbResponse) {
//Array to store the bulk info for ES
var bulkLoad = [];
//Iterate over the pages
Object.getOwnPropertyNames(fbResponse).forEach(function(page, idx, array) {
var pageData = fbResponse[page];
var pageId = pageData.id;
pageData.posts.data.forEach(function(pagePostObj, idx, array) {
var postObj = {};
postObj.country = pageMapping[page].country;
postObj.company = pageMapping[page].company;
postObj.page_id = pageData.id;
postObj.page_name = pageData.name;
postObj.post_id = pagePostObj.id;
postObj.message = pagePostObj.message;
postObj.created_time = pagePostObj.created_time;
postObj.link = pagePostObj.link;
postObj.picture = pagePostObj.picture;
postObj.place = pagePostObj.place;
postObj.status_type = pagePostObj.status_type;
postObj.shares_count = pagePostObj.shares.count;
postObj.likes_count = pagePostObj.likes.summary.total_count;
postObj.comments_count = pagePostObj.comments.summary.total_count;
//Push bulk load metadata
bulkLoad.push({ "index" : { "_index": esInfo.index, "_type": esInfo.type } })
//Push actual object data
bulkLoad.push(postObj);
});
});
//You can now take the bulkLoad object and POST it to Elasticsearch!
console.log(JSON.stringify(bulkLoad));
});

Related

Search and update in array of objects MongoDB

I have a collection in MongoDB containing search history of a user where each document is stored like:
"_id": "user1"
searchHistory: {
"product1": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
},
{
"timestamp": 1623481234,
"query": {
"query": "lindor",
"qty": 4
}
},
],
"product2": [
{
"timestamp": 1623473622,
"query": {
"query": "table",
"qty": 1
}
},
{
"timestamp": 1623438232,
"query": {
"query": "ike",
"qty": 1
}
},
]
}
Here _id of document acts like a foreign key to the user document in another collection.
I have backend running on nodejs and this function is used to store a new search history in the record.
exports.updateUserSearchCount = function (userId, productId, searchDetails) {
let addToSetData = {}
let key = `searchHistory.${productId}`
addToSetData[key] = { "timestamp": new Date().getTime(), "query": searchDetails }
return client.db("mydb").collection("userSearchHistory").updateOne({ "_id": userId }, { "$addToSet": addToSetData }, { upsert: true }, async (err, res) => {
})
}
Now, I want to get search history of a user based on query only using the db.find().
I want something like this:
db.find({"_id": "user1", "searchHistory.somewildcard.query": "some query"})
I need a wildcard which will replace ".somewildcard." to search in all products searched.
I saw a suggestion that we should store document like:
"_id": "user1"
searchHistory: [
{
"key": "product1",
"value": [
{
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
]
}
]
However if I store document like this, then adding search history to existing document becomes a tideous and confusing task.
What should I do?
It's always a bad idea to save values are keys, for this exact reason you're facing. It heavily limits querying that field, obviously the trade off is that it makes updates much easier.
I personally recommend you do not save these searches in nested form at all, this will cause you scaling issues quite quickly, assuming these fields are indexed you will start seeing performance issues when the arrays get's too large ( few hundred searches ).
So my personal recommendation is for you to save it in a new collection like so:
{
"user_id": "1",
"key": "product1",
"timestamp": 1623482432,
"query": {
"query": "chocolate",
"qty": 2
}
}
Now querying a specific user or a specific product or even a query substring is all very easily supported by creating some basic indexes. an "update" in this case would just be to insert a new document which is also much faster.
If you still prefer to keep the nested structure, then I recommend you do switch to the recommended structure you posted, as you mentioned updates will become slightly more tedious, but you can still do it quite easily using arrayFilters for updating a specific element or just using $push for adding a new search

How to retrieve data json api to form in flutter

I have a form textfield and dropdown and I have data productCode = "INQPREPAID50" i want to become 50000 in dropdown field "Nominal".
This my response json:
{
"responseCode": "0000",
"responseMessage": "Success",
"date": "20200320",
"time": "142352",
"currency": "IDR",
"content": {
"productCode": "INQPREPAID50",
"productName": "INQ Prabayar 50.000"
},
}
How to retrieve productCode = "INQPREPAID50" become to only number 50000 in dropdown "Nominal" and display data json api in next page screen. Please help me. Thanks.
You can use some function to parsing like
setNumber(String title) {
if (title.contains('50') {
return 50000;
} else {
return 0;
}
}
so just call in your text/widget example
Text("${setNumber(productCode)}",

Filter ResultSet after MongDB MapReduce

Consider the following MongoDB collection / prototype that keeps track of how many cookies a given person has at a given point in time:
{
"_id": ObjectId("5c5b5c1865e463c5b6a5b748"),
"person": "Drew",
"cookies": 1,
"timestamp": ISODate("2019-02-05T20:34:48.922Z")
}
{
"_id": ObjectId("5c5b5c2265e463c5b6a5b749"),
"person": "Max",
"cookies": 3,
"timestamp": ISODate("2019-02-06T20:34:48.922Z")
}
{
"_id": ObjectId("5c5b5c2e65e463c5b6a5b74a"),
"person": "Max",
"cookies": 0,
"timestamp": ISODate("2019-02-07T20:34:48.922Z")
}
Ultimately, I need to get all people who currently have more than 0 cookies - In the above example, only "Drew" would qualify - ("Max" had 3, but later only had 0).
I've written the following map / reduce functions to sort this out..
var map = function(){
emit(this.person, {'timestamp' : this.timestamp, 'cookies': this.cookies})
}
var reduce = function(person, cookies){
let latestCookie = cookies.sort(function(a,b){
if(a.timestamp > b.timestamp){
return -1;
} else if(a.timestamp < b.timestamp){
return 1
} else {
return 0;
}
})[0];
return {
'timestamp' : latestCookie.timestamp,
'cookies' : latestCookie.cookies
};
}
This works fine and I get the following resultSet:
db.cookies.mapReduce(map, reduce, {out:{inline:1}})
...
"results": [
{
"_id": "Drew",
"value": {
"timestamp": ISODate("2019-02-05T20:34:48.922Z"),
"cookies": 1
}
},
{
"_id": "Max",
"value": {
"timestamp": ISODate("2019-02-07T20:34:48.922Z"),
"cookies": 0
}
}
],
...
Max is included in the results - But I'd like for him to not be included (he has 0 cookies after all)
What are my options here? I'm still relatively new to MongoDB. I have looked at finalize as well as creating a temporary collection ({out: "temp.cookies"}) but I'm just curious if there's an easier option or parameter I am overlooking.
Am I crazy for using MapReduce to solve this problem? The actual workload behind this scenario will include millions of rows..
Thanks in advance

Querying grandchild properties in Firebase

I try to come up with a Firebase realtime DB structure for an online store.
The store should have a collection of products, each product can belong to one or more categories. Is it possible to construct a query to get all products in the computers category with a single HTTP request (assuming I use Firebase REST API)? Here is a sample piece of data:
{
"products": {
"-KaXxv2xD9WaIqHMsHYM": {
"title": "Item 1",
"categories": {
"electronics": true,
"computers": true
}
},
"-KaXyvdw5gmuBmGi5unb": {
"title": "Item 2",
"categories": {
"electronics": true
}
},
"-KaXyyyzmP9Y6askhLdx": {
"title": "Item 3",
"categories": {
"computers": true
}
}
}
}
I was also trying to use arrays for categories but looks like arrays support is very limited in Firebase and they should be avoided.
UPDATE:
This query works:
GET /products.json?orderBy="categories/computers"&equalTo=true
But it requires an index for every single category:
{
"rules": {
"products": {
".indexOn": ["categories/computers", "categories/electronics"]
}
}
}
You should have an additional categories node which have products list. That would make easier and efficient access to products for a specific category.
Similar aproach is used at Firebse sample. See code
childUpdates.put("/posts/" + key, postValues);
childUpdates.put("/user-posts/" + userId + "/" + key, postValues);
They have save same data at posts and user-posts nodes.

Meteor. Sorting my collection by a deeply nested value

In my application I have a list of tiles representing each project in a portfolio. This is the main list view for the app and all projects are fetched from the collection without any sorting or ordering.
When I have an optional slug parameter specified in my route (for the category assigned to the project) I want to be able to display the projects within the UI that match that category first, and then display the other ones that don't match the category.
For reference, I have included the code for the route below:
/**
* Project list view (all projects) with optional
* filter parameter for showing projects only by
* their category name.
*/
this.route('list', {
path: '/:_category_slug?',
template: 'template_main',
action: function() {
if(this.ready()) {
this.render();
}
},
waitOn: function() {
return [
Meteor.subscribe('projects'),
Meteor.subscribe('formations'),
Meteor.subscribe('categories')
];
},
data: function() {
if(this.params._category_slug) {
/**
* Building up the query given the category slug and the language
*/
var query = {};
query['slug.' + App.language] = this.params._category_slug;
/**
* Grab the category given the query, so we can get its 'id'
*/
var category = App.models.categories.findOne(query);
/**
* This is the query I need to work on so that I can achieve what I want
*/
return App.models.projects.find({}).fetch();
}
else {
return App.models.projects.find({}).fetch();
}
},
yieldTemplates: {
'components_header': {to: 'header'},
'views_list': {to: 'content'},
'components_footer': {to: 'footer'}
}
});
For reference, I have also included a sample of the data for three projects that is relevant to this question.
{
"id": 10,
"slug": {
"en": "sample-english-slug",
},
"title": {
"en": "Sample English Title",
},
"description": {
"en": "A good description.",
},
"category_ids": [
{
"id": 5
},
{
"id": 6
}
],
},
{
"id": 12,
"slug": {
"en": "another-sample-slug",
},
"title": {
"en": "Another sample title",
},
"description": {
"en": "Sample description three",
},
"category_ids": [
{
"id": 1
},
{
"id": 4
}
],
},
{
"id": 11,
"slug": {
"en": "another-sample-slug",
},
"title": {
"en": "A sample title",
},
"description": {
"en": "Sample description",
},
"category_ids": [
{
"id": 2
},
{
"id": 5
}
],
}
So what I would want to do is make sure that given a category with an ID of 5, I want those first two projects to be the first two that appear.
Can this be done in meteor, without having to resort to writing extra logic in JS? One approach I did have once was to update each project from within the Client side collection (something I no longer do) and set a few extra attributes, then sort after that.
When dealing with syncing client and server collections, this is not really feasible.
From the mongodb docs:
Use the dot notation to match by specific fields in an embedded document. Equality matches for specific fields in an embedded document will select documents in the collection where the embedded document contains the specified fields with the specified values. The embedded document can contain additional fields.
I don't know if you can do it with a single query, but you can concat two complementary queries that use dot notation.
var selected = App.models.projects.find({'category_ids.id': category._id}).fetch();
var other = App.models.projects.find({'category_ids.id': {$ne: category._id}}).fetch();
return selected.concat(other);