Twitter (Social networking) Dataset - facebook

I am looking for twitter or other social networking sites dataset for my project. I currently have the CAW 2.0 twitter dataset but it only contains tweets of users. I want a data that shows the number of friends, follower and such.
It does not have to be twitter but I would prefer twitter or facebook. I already tried infochimps but apparently the file is not downloadable anymore for twitter.
Can someone give me good websites for finding this kind of dataset. I am going to feed the dataset to hadoop.

Try the following three datasets:
Contains around 97 milllion tweets:
http://demeter.inf.ed.ac.uk/index.php?option=com_content&view=article&id=2:test-post-for-twitter&catid=1:twitter&Itemid=2
ed note: the dataset previously linked above is no longer available because of a request from Twitter to remove it.
Contains user graph of 47 million users:
http://an.kaist.ac.kr/traces/WWW2010.html
Following dataset contains network as well as tweets, however the data was collected by snowball sampling or something hence the friends network is not uniform. It has around 10 million tweets you can mail the researcher for even more data.
http://www.public.asu.edu/~mdechoud/datasets.html
Though have a look at the license the data is distributed under.
Hope this helps,
Also can you tell me what kind of work are planning with this dataset?
I have few hadoop / pig scripts to use with dataset

100 million pages were extracted from facebook :
http://it.slashdot.org/story/10/07/28/1350222/100-Million-Facebook-Pages-Leaked-On-Torrent-Site?art_pos=6
I don't know what they contain, but you could have a look, it seems it's easy to find on torrents sites.
You could also use the facebook API, but if you want a dataset big enough, you would have to ask facebook the rights to access it.
It contains links to friends, likes, groups, ...

Facebook social graph, application installations and Last.fm users, events, groups collected by researchers at UCIrvine: http://odysseas.calit2.uci.edu/research/

I think the best tool for twitter data gathering is http://www.followthehashtag.com , it can get historical or future data and with advanced data exporting features
With a section where we add big datasets (about 200,000 tweets) once a week
http://followthehashtag.com/datasets/

Related

How can we collect the facebook group request data using a chrome extension?

Is there any way that we can collect data by asking the questions when someone wants to join the group. And we collect the data when we approve the joining request or sometimes programmatically?
Could you please elaborate on how we can use Facebook GRAPH API for fulfilling the above purpose?
Check out GroupTrack CRM...it's a CRM that is integrated into Facebook via a Chrome Extension. It does exactly what you asked (one click to approve individual or all pending members while also saving their answers to your questions and adding them to the CRM), along with a ton of other awesome stuff.
Keep notes and tags, track sales funnel stages, bookmark posts and comments, set follow up tasks with reminders, and more across unlimited Groups. Everything is synced in real time with a web app as well, so you can access your contact information from anywhere, plus it can be set up to integrate with external systems (Google Sheets, Streak, and Kartra at the moment, but many more to come).
Lastly, GroupTrack supports teams, so if you run a Group with other admins, you can share access to the CRM and have everything kept in sync. It's awesome!

How to get like/share/comment data off of Facebook

I need to write a program for retrieving like/share/comment information from different groups I am the admin of on Facebook and tally up how many likes.shares/comments on each person does on the page to determine who is participating the most on different groups. But I have not done anything like this before so I was hoping to get some suggestions on how to do this. I would like to do it in Python or C++ if possible since I am familiar with those languages. But I am open to using PHP as well. Thanks in advance for any good suggestions.
This should get you started: https://developers.facebook.com/docs/groups-api/common-uses#getting-group-posts
For comments, shares and likes you would need to request the respective fields/endpoints by adding the fields parameter to your call. However you cannot retrieve user data unless your app had been reviewed and approved. So for testing purposes you can only test this with your own posts/shares/comments/likes.

Loading & Connecting Facebook Pixel Conversions Data

I am trying to load the Facebook Pixel Conversions level data from the Marketing/Insights API but not able to do it at the level I want or even properly
I have various pixels created in the form of events eg: Leads, Registrations etc and need to track them
After reading the documentation for Ads Pixels and it's stats - I was able to load some basic fields for now - but still not able to pull the s
GET API Query : https://graph.facebook.com/v2.9/act_/adspixels?fields=name,id,creation_time,last_fired_time
This gives me all the correct Ads Pixel details but how do I pull all the stats for this in the form of Events, their occurrences etc - will I be using more query parameters in this URL or a new URL - tried multiple iterations but was not able to get anything to work for now.
Tried this API Query as per documentation -: https://graph.facebook.com/v2.9//stats - but does not work even with fields added etc
Another issue I had was I am not at all able to test my queries with Graph API explorer - it keeps telling me that "Timeout issue" or "some other errors" when I am trying to use the app etc there. Do I need to publish and approve the app before hitting FB Ads data via the API Explorer?
All your suggestions and feedback will be highly appreciated here
I was searching for some things related and encoutered your thread.. I will report my findings .. maybe you already know this, but here it goes.
As far as querying with Graph API explorer.. it doesn't seem to work with Marketing API. You need to create your own app, and enable market api, in order to get the necessary token.
I am following the instructions on the link you provided: stats
Second.. to get the stats I am using
graph.facebook.com/v2.11/{pixel-id}/stats?aggregation=pixel_fire
The aggregation is necessary to get results. I can get the "Page View" event listed that I am tracking on a website.
I was able to compare these results, with the ones showed to me on the events manager page of the pixel.
Hope this helps

Collecting Data from Facebook Group

I'm not one of the Facebook Developer, but I need some data for my thesis, regarding one group in Facebook which I'm currently observing.
The problem is, I must collect the data within the last 6 months of:
how many members have joined in the last 6 months, if possible, can be split by monthly.
how many postings in the group in the last 6 months, also.. if possible split by monthly
how many active users within the last 6 months.
Can somebody give me some hints of how to collect those information?
You're going to have a hard time doing this. Groups aren't very API friendly, and they don't have their own insights information.
You can try browsing the group's feed using the Graph API Explorer using the /GROUP_NAME_OR_ID/feed edge, and adding since and until filters to look at monthly data.
However, you won't see all the posts because of Facebook privacy filtering. To get the most reliable data, you'll need to manually count the entries of interest from within the Facebook webapp.

Realtime Twitter Replies?

I have created Twitter bots for many geographic locations. I want to allow users to #-reply to the Twitter bot with commands and then have the bot respond with the results. I would like to have the bot reply to the user as quickly as possible (realtime).
Apparently, Twitter used to have an XMPP/Jabber interface that would provide this type of realtime feed of replies but it was shut down.
As I see it my options are to use one of the following:
REST API
This would involve polling every X minutes for each bot. The problem with this is that it is not realtime and each Twitter account would have to be polled.
Search API
The search API does allow specifying a "-to" parameter in the search and replies to all bots could be aggregated in a search such as "-to bot1 OR -to bot2...". Though if you have hundreds of bots then the search string would get very long and probably exceed the maximum length of a GET request.
Streaming API
The streaming API looks very promising as it provides realtime results. The API allows you to specify a follow and track parameters. follow is not useful as the bot does not know who will be sending it commands. track allows you to specify keywords to track. This could possibly work by creating a daemon process that connects to the Streaming API and tracks all references to the bot's names. Once again since there are lots of bots to track the length and complexity of the query may be an issue. Another idea would be to track a special hashtag such as #botcommand and then a user could send a command using this syntax #bot1 weather #botcommand. Then by using the Streaming API to track all references to #botcommand would give you a realtime stream of all the commands. Further parsing could then be done to determine which bot to send the command to. This blog post has more details on the Streaming API
Third-party service
Are there any third-party companies that have access to the Twitter firehouse and offer realtime data?
I haven't investigated these, but here are a few that I have found:
Gnip
Tweet.IM
excla.im
TwitterSpy - seems to use polling, not realtime
tweethook
I'm leaning towards using the Streaming API. Is there a better way to get near realtime #-replies for many (hundreds) of Twitter accounts?
UPDATE: Twitter just announced that in the future they will have User Streams which expands upon the Streaming API. User Streams Preview
Either track or follow will work for the cases you describe. See http://apiwiki.twitter.com/Streaming-API-Documentation#track for details on what track actually does. The doc on follow is on the same page.
There are rate limits of sorts on the streaming API, but they have to do with how big a slice of the total tweet stream you're consuming. For writing a bot like this you won't hit these limits without a pretty big user base. And when you get that user base you can apply for elevated access levels that increase the rate limets.
There's the twitter firehose but you're probably best off using the Streaming API. The firehose is open to Google (try googling your twitter name) and as the link says they're opening it up to all soon enough.
You'll want to get your IP whitelist too.
If your not already, you want to check out the GoogleGroup for twitter devs.
The track predicate for the streaming api would actually be useful because if you follow your bot's user IDs, you'll get all the messages made by your bots and all the other messages that mention your bots #usernames (including #replies). It really does track everything public on twitter relating to the user IDs you follow with it, give it a shot.
REST API:
The most comprehensive results with the least amount of false positives. Will include protected statuses if the bot is following the protected account. If you poll every thirty seconds it is pretty close to realtime and you will be well under your rate limit (350/hour) if you are using api.twitter.com/1 with OAuth.
Streaming API:
You will want to avoid the Search API. It is trending more and more towards popular results and not complete results.
Streaming API
The fastest but also likely to miss some statuses as well as include false positives. Protected statuses for example are not included. Track for a screen_name will return statuses with that screen_name in it but will also include tweets that just have the screen_name as a string without the # so be sure to filter on your side.