how to count re tweet number in mongodb - mongodb

I want to count how many retweets in MongoDB.I got twitter data by streaming API.
Here is my different query but none of them worked..Please help me
db.collection.find({"retweet_count":{$ne:0}}).count()
db.collection.find({retweeted:ture}).count()
db.collection.find({retweeted:false}).count()
Related data structure
{
"retweeted" : false,
"in_reply_to_status_id" : null,
"in_reply_to_status_id_str" : null,
"is_quote_status" : false,
"id_str" : "815351057677094912",
"created_at" : "Sun Jan 01 00:16:55 +0000 2017",
"id" : NumberLong("815351057677094912"),
"retweet_count" : 0,
"truncated" : false,
"contributors" : null
}

Related

Filter (nested) boolean field in Mongo / Meteor

I have a Mongo collection (using Meteor) structured like so:
{
"_id" : "xx",
"name" : "xx",
"handle" : "xx",
"tweets" : [
{
"published" : true,
"tweet_id" : "xx",
"text" : "xx",
"created_at" : "Mon Jul 31 18:18:38 +0000 2017",
"retweet_count" : 0,
"from" : "x",
"from_full_name" : "x",
"from_profile_image" : "x"
},
{
"published" : true,
"tweet_id" : "xx",
"text" : "xx",
"created_at" : "Mon Jul 31 18:18:38 +0000 2017",
"retweet_count" : 0,
"from" : "x",
"from_full_name" : "x",
"from_profile_image" : "x"
},
{
"published" : false,
"tweet_id" : "xx",
"text" : "xx",
"created_at" : "Mon Jul 31 18:18:38 +0000 2017",
"retweet_count" : 0,
"from" : "x",
"from_full_name" : "x",
"from_profile_image" : "x"
},
]
}
I only want to display the published tweets. I am using a template helper to retrieve and filter:
return Tweets.find({
handle:handle,
"tweets.published": true
});
I cannot seem to get the nested filter on 'published' to work. All tweets are displayed, using the above code. I have tried many different permutations of the "tweets.published": true. What is the correct way to filter out the unpublished tweets?
Since the tweets field is an array of objects, not just one object, your method will not work.
First you should use the handle to find the correct document:
return Tweets.find({
handle: handle,
});
This must then be combined with a fields, to select tweets which should be returned.
return Tweets.find({
handle: handle,
}, {
fields: { tweets: { $elemMatch: { published: true } } }
});
the $elemMatch part specifies the tweet must be publised. More information on the mongo page.
EDIT
If the snippet must run on the client (in your case), you have other options. You can use a server publication with my snippet to only give published tweets to the client.
Alternatively, give the transform option to the find request.
return Tweets.find({
handle: handle
}, {
transform: doc => {
doc.tweets = doc.tweets.filter(tweet => tweet.published);
return doc;
}
});

How to get collection fields with one where condition and a date range using Spring mongoTemplate?

I want to translate following console mongodb query into Spring mongoTemplate query
db.transaction.find({"account_number":10}, {"entry_date":{"$gte": new ISODate("2016-07-20"),"$lte": new ISODate("2016-07-21")}}).count()
I have tried following
Query query = new Query();
Criteria criteria =
Criteria.where("account_number").is(accountNumber).and("account_code").is(accountCode)
.and("entry_date").gte(start).lte(end);
query.addCriteria(criteria);
mongoTemplate.find(query, Transaction.class, "transaction");
return mongoTemplate.find(query, Transaction.class, "transaction");
When debugging, I got above code translated into following query value. The query is not returning same results I got on mongo console.
Query: { "account_number" : 10 , "account_code" : 2102 , "entry_date" : { "$gte" : "2015-05-20" , "$lte" : "2016-07-21"}}
The question is how I can build a query to pass two different conditions with find like I did on console ?
What I am trying to achieve is get all transactions between two dates (inclusive) for a given account_number.
Collection Structure
{
"_id" : ObjectId("5825e49585a4caf2bfa30ff4"),
"profit" : "",
"account_number" : 280,
"m_number" : "",
"registration_number" : "",
"page_number" : "",
"entry_date" : ISODate("2014-10-20T07:33:57Z"),
"narration" : "To cash",
"voucher_number" : "",
"debit_credit" : -4400,
"account_code" : 2105,
"created_at" : ISODate("2014-10-20T07:33:57Z"),
"updated_at" : ISODate("2014-10-20T07:33:57Z"),
"employee_id" : 0,
"balance" : 0,
"credit" : 0,
"debit" : 0,
"particulars" : "",
"since_last" : 0,
"transaction_type" : 0,
"voucher_path" : "",
"branch" : "",
"auto_voucher_number" : "",
"check_book_series" : ""
}
{
"_id" : ObjectId("5825e49585a4caf2bfa30ff5"),
"profit" : "",
"account_number" : 1555,
"m_number" : "",
"registration_number" : "",
"page_number" : "",
"entry_date" : ISODate("2014-10-20T07:33:57Z"),
"narration" : "To",
"voucher_number" : 73804,
"debit_credit" : -1550,
"account_code" : 2101,
"created_at" : ISODate("2014-10-20T07:33:57Z"),
"updated_at" : ISODate("2014-10-20T07:33:57Z"),
"employee_id" : 0,
"balance" : 0,
"credit" : 0,
"debit" : 0,
"particulars" : "",
"since_last" : 0,
"transaction_type" : 0,
"voucher_path" : "",
"branch" : "",
"auto_voucher_number" : "",
"check_book_series" : ""
}
Solution
The original query is only returning records for account_number=10 and ignores entry_date condition. I guess it is wrong way of querying. Here is the correct solution
DateTime start, DateTime end;
Query query = new Query();
Criteria criteria =
Criteria.where("account_number").is(accountNumber).and("account_code").is(accountCode)
.and("entry_date").gte(start).lte(end);
query.addCriteria(criteria);
return mongoTemplate.find(query, Transaction.class, "transaction");
it is important that the dates are in correct format. E.g java.util.Date or JODA DateTime
You are passing the dates as String. Try passing both the start date and end date as java.util.Date types. For example
LocalDate startLocalDate = LocalDate.of(2014, 10, 19);
Date startDate = Date.from(startLocalDate.atStartOfDay().toInstant(ZoneOffset.UTC));
Criteria criteria = Criteria.where("entry_date").gte(startDate);

Deal with accented character in mongodb

I've been struggling to find an ideal way to deal with accented characters.
First i use this two methods to encode(decode) text before inserting(fetching) in mongo.
function encode_utf8(s) {
return unescape(encodeURIComponent(s));
}
function decode_utf8(s) {
return decodeURIComponent(unescape(s));
}
This method works fine when i fetch comments from mongodb and display them (ajax request) :
$.ajax({
url : "http:www.mywebsite.com/comments",
type : "get",
data : "key=" + env.key + "&login=" + login,
dataType : "json",
success: function(response) { console.log(decode_utf8(reponse.texte)); /* yay works fine */ },
error : function(jqXHR, textStatus, errorThrown) {}
});
But when i use the code (map function of MapReduce) below it's doesn't handle accent:
String map = "function map() { "
+ "var texte = decodeURIComponent(unescape(this.comment));"
+ "var words = texte.match(/\\w+/g);"
+ "if(words) "
+ for(var i=0; i<words.length; i++)
+ emit(words[i].toLowerCase(), 1);"
+ "}"
Ex: Instead of ùùdzedzed the result is dzedzed.
Any suggestion or workaround would be great.
EDIT :
My mongo content :
{ "_id" : ObjectId("55460441e4b00700737c56cc"), "id" : "c0ab4be5-f4f3-4c73-a7a0-f6863b6ce6511430651969105", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "être où règlem ça", "date" : "Sun May 03 13:19:29 CEST 2015" }
{ "_id" : ObjectId("554612f1e4b0953aca0c0be0"), "id" : "aaa52859-5de6-469b-a17f-aa4615db77f71430655729171", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "à la", "date" : "Sun May 03 14:22:09 CEST 2015" }
{ "_id" : ObjectId("55461c21e4b0643ea96f7933"), "id" : "ba3ae7c0-39a6-4c77-86ec-d193e39759b71430658081921", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "voilà", "date" : "Sun May 03 15:01:21 CEST 2015" }
{ "_id" : ObjectId("5547f5ede4b055c2ffb1a592"), "id" : "a9a1e7d4-c28d-4f9e-98d5-3fe7fba3bb5e1430779373121", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "vb vb vb", "date" : "Tue May 05 00:42:53 CEST 2015" }
{ "_id" : ObjectId("5547f5fbe4b055c2ffb1a593"), "id" : "b5ad2b7e-987f-4d32-b5ca-bc06bdc57f611430779387478", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "vb vb", "date" : "Tue May 05 00:43:07 CEST 2015" }
{ "_id" : ObjectId("5548b88ee4b029bc67d7a638"), "id" : "451e4e0d-c15a-4657-82c3-7760555b6c811430829198040", "auteur_login" : "testlogin", "auteur_id" : 1, "comment" : "ecrit n'importe quoi", "date" : "Tue May 05 14:33:18 CEST 2015" }
{ "_id" : ObjectId("554b9409e4b004c1b252f9cb"), "id" : "f36c3b49-72b1-48f4-ae92-a7933bc1e2761431016457515", "auteur_login" : "moi12345", "auteur_id" : 9, "comment" : "salut", "date" : "Thu May 07 18:34:17 CEST 2015" }
{ "_id" : ObjectId("554b941be4b004c1b252f9cd"), "id" : "1adc6788-32be-4c3a-bd73-16dad0b4bb741431016475777", "auteur_login" : "moi12345", "auteur_id" : 9, "comment" : "regregerhg", "date" : "Thu May 07 18:34:35 CEST 2015" }
And this is the result :
"sfsdfs": 58.0 update
"ta": 58.0 update
"test1": 58.0 update
"teste": 9.666666666666666 update
"ton": 58.0 update
"tre": 29.0 insert
"try": 58.0 update
"tudiant": 58.0 insert
"tweed": 58.0 update
"une": 58.0 update
"va": 19.333333333333332 update
"vb": 29.0 update
"veux": 19.333333333333332 update
"vient": 58.0 update
"voil": 58.0 insert
Example: the last line i should be getting voilàinstead of voil
So, essentially, your "encode_utf8" method does a weird way of converting UTF-8 string into 8-bit bytes. Why are you doing that?
([de/en]codeURIComponent() uses UTF-8 encoding, [un]encode uses ISO-8859-1)
For symmetry, your "decode_utf8(s)" shall be
return decodeURIComponent(escape(s));
instead of "unescape".
Note that escape() and unescape() Javascript methods are deprecated
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/escape
https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Global_Objects/unescape

Migrating from MongoDB to HBase

Hi I am very new to HBase database. I downloaded some twitter data and stored into MongoDB. Now I need to transform that data into HBase to speed-up Hadoop processing. But I am not able to create it's scheme. Here I have twitter data into JSON format-
{
"_id" : ObjectId("512b71e6e4b02a4322d1c0b0"),
"id" : NumberLong("306044618179506176"),
"source" : "Facebook",
"user" : {
"name" : "Dada Bhagwan",
"location" : "India",
"url" : "http://www.dadabhagwan.org",
"id" : 191724440,
"protected" : false,
"timeZone" : null,
"description" : "Founder of Akram Vignan - Practical Spiritual Science of Self Realization",
"screenName" : "dadabhagwan",
"geoEnabled" : false,
"profileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"profileImageUrlHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"profileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_normal.jpg",
"biggerProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_bigger.jpg",
"miniProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURLHttps" : "https://si0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"followersCount" : 499,
"profileBackgroundColor" : "EEE4C1",
"profileTextColor" : "333333",
"profileLinkColor" : "990000",
"lang" : "en",
"profileSidebarFillColor" : "FCF9EC",
"profileSidebarBorderColor" : "CBC09A",
"profileUseBackgroundImage" : true,
"showAllInlineMedia" : false,
"friendsCount" : 1,
"favouritesCount" : 0,
"profileBackgroundImageUrl" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageURL" : "http://a0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBackgroundImageUrlHttps" : "https://si0.twimg.com/profile_background_images/396759326/dadabhagwan-twitter.jpg",
"profileBannerURL" : null,
"profileBannerRetinaURL" : null,
"profileBannerIPadURL" : null,
"profileBannerIPadRetinaURL" : null,
"miniProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034_mini.jpg",
"originalProfileImageURL" : "http://a0.twimg.com/profile_images/1647956820/M_DSC_0034.jpg",
"utcOffset" : -1,
"contributorsEnabled" : false,
"status" : null,
"createdAt" : NumberLong("1284700143000"),
"profileBannerMobileURL" : null,
"profileBannerMobileRetinaURL" : null,
"profileBackgroundTiled" : false,
"statusesCount" : 1713,
"verified" : false,
"translator" : false,
"listedCount" : 6,
"followRequestSent" : false,
"descriptionURLEntities" : [ ],
"urlentity" : {
"url" : "http://www.dadabhagwan.org",
"start" : 0,
"end" : 26,
"expandedURL" : "http://www.dadabhagwan.org",
"displayURL" : "http://www.dadabhagwan.org"
},
"rateLimitStatus" : null,
"accessLevel" : 0
},
"contributors" : [ ],
"geoLocation" : null,
"place" : null,
"favorited" : false,
"retweet" : false,
"retweetedStatus" : null,
"retweetCount" : 0,
"userMentionEntities" : [ ],
"retweetedByMe" : false,
"currentUserRetweetId" : -1,
"possiblySensitive" : false,
"urlentities" : [
{
"url" : "http://t.co/gR1GohGjaj",
"start" : 113,
"end" : 135,
"expandedURL" : "http://fb.me/2j2HKHJrM",
"displayURL" : "fb.me/2j2HKHJrM"
}
],
"hashtagEntities" : [ ],
"mediaEntities" : [ ],
"truncated" : false,
"inReplyToStatusId" : -1,
"text" : "Spiritual Quote of the Day :\n\n‘I am Chandubhai’ is an illusion itself and from that are \nkarmas charged. When... http://t.co/gR1GohGjaj",
"inReplyToUserId" : -1,
"inReplyToScreenName" : null,
"createdAt" : NumberLong("1361801697000"),
"rateLimitStatus" : null,
"accessLevel" : 0
}
Here how to divide data into columns and column-family? I thought to make one "twitter" column-family that contain source, getlocation, place, retweet etc... and another "user" column-family and that contain name, location etc... (user's data). i.e new column family for each inner level sub-document.
Is this approach is correct? Now How I will differentiate urlentity for "user" column-family and "twitter" column-family?
And how to handle those keys that contain list of sub-documents (for e.g. urlentity)
There are many ways to model this in HBase ranging from storing everything in a single column to having a different table for each sub entity with several other tables for "indexing".
Generally speaking you model the data in hbase based on you read and write access patterns. fo r example column family are stored in different files on disk. A reason to divide data into two column families is if there are a lot of cases where you need data from one and not the other. etc.
There's a good presentation about HBAse schema design by Ian Varley from HBaseCon 2012 you can find the slides here and the video here

Mongodb save/upsert using C# drivers, continuous array adds and field updates to same doc

I need some ideas/tips for this. Here is a sample document I am storing:
{
"_id" : new BinData(0, "C3hBhRCZ5ZFizqbO1hxwrA=="),
"gId" : 237,
"name" : "WEATHER STATION",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw" : [{
"tag" : "Weather Sensors",
"snrs" : [{
"_id" : NumberLong(7),
"sdn" : "Wind Speed"
}, {
"_id" : NumberLong(24),
"sdn" : "Wind Gust"
}, {
"_id" : NumberLong(28),
"sdn" : "Wind Direction"
}, {
"_id" : NumberLong(31),
"sdn" : "Rainfall Amount"
}, {
"_id" : NumberLong(33),
"sdn" : "Rainfall Peak Amount"
}, {
"_id" : NumberLong(38),
"sdn" : "Barometric Pressure"
}],
"_id" : 1
}]
}
What I am currently doing is using the C# driver and performing a .Save() to my collection to get upsert, however, what I want is kinda a hybrid approach I guess. Here are the distinct operations I need to be able to perform:
Upsert entire document if it does not exist
Update the dt field with a new timestamp if the document does exist
For the hw field, I need several things here. If hw._id exists, update its tag field as well as handling the snrs field by either updating existing entries so the sdn value is updated or adding entirely new entires when _id does not exist
Nothing should ever be removed from the hw array and nothing should ever be removed from the snrs array.
A standard upsert does not appear to get me what I am after, so I am looking for the best way to do what I need with as few roundtrips to the server as possible. I am thinking some of the $ Operators may be what I am needing here, but just need some thoughts on how best to approach this.
The gist of what I am doing here is keeping an accumulating, historical document of snrs entries with the immediate current value as well as retaining any historical entries in the array even though they are no longer "alive", being reported, etc. This allows future reporting on things that no longer exist in current time, but were at some point in the past. _id values are application-generated, globally unique across all documents, and never change after initial creation. For example, last week "Wind Speed" was being reported, but this week it is not. It's _id value, however, will not change if "Wind Speed" starts reporting again. Follow?
Clarifications or more detail can be provided if needed.
Thanks.
By changing the structure of your document from embedded arrays to subdocuments key'ed by the _ids you can do this.
e.g.
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-02-24T18:59:02Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}
I created the above document by the following upsert
db.foo.update(
{_id:1},
{
$set: {
"gId" : 237,
"name" : "WEATHER STATION 1",
"mId" : 341457,
"MAC" : "00:00:00:00:00:01",
"dt" : new Date("Fri, 24 Feb 2012 13:59:02 GMT -05:00"),
"hw.1.tag" : "Weather Sensors",
"hw.1.snrs.1" : "Wind Speed",
"hw.1.snrs.2" : "Wind Gust"
}
},
true
)
Now when I run
db.foo.update(
{_id:1},
{
$set: {
"dt" : new Date(),
"hw.2.snrs.1" : "Rainfall Amount"
}
},
true
)
I get
{
"MAC" : "00:00:00:00:00:01",
"_id" : 1,
"dt" : ISODate("2012-03-07T05:14:31.881Z"),
"gId" : 237,
"hw" : {
"1" : {
"snrs" : {
"1" : "Wind Speed",
"2" : "Wind Gust"
},
"tag" : "Weather Sensors"
},
"2" : {
"snrs" : {
"1" : "Rainfall Amount"
}
}
},
"mId" : 341457,
"name" : "WEATHER STATION 1"
}