Crawling facebook and wikipedia to get time resolved data - facebook

I am studying the growth of Facebook and Wikipedia networks. For this, I need to get time resolved data for these websites. So for Facebook, I will pick a subset of users (say 100) and then see how their numbers of friends increases in time and who gets connected to whom. This data I would collect after every 10 days or so for some time. Same for Wikipages. Does anybody have any idea as how this can be done? I have no idea about how to use any webcrawlers. Thanks in advance.

Related

Google Adsense revenue slowing

I don't know where to ask this, so I'm going to just ask it here and handle the backlash when ever it comes.
First let me state that I'm familiar with the recent issues relating to Pepsi and YouTube how ever what I am currently unsure of is, does this effect website publishers also, or has my traffic just all around stopped converting?
I'm currently getting around 300,000 impressions per day and my revenue has dropped to as low as 0.19cpc when before it was sitting almost around a dollar for every registered click.
Question basically comes down to, are you guys getting effected by this also? What can I do in the mean time if this is in correlation with the advertisers opting out, because it's getting really hard to manage servers with no revenue.
Yes it seems to have had a drastic impact on my earnings as well, although I'm not doing anywhere close to the impressions you're doing.
My cpc and rpm have hit an all time low and its disastrous. I'm seeing the same cpc as you btw, we're not alone other publishers are complaining about the same issue.
I'm considering dropping a few adsense units till i see some better figures.

Fetching 1 minute bars from Yahoo Finance

I'm trying to download 1 minute historical stock prices from Yahoo Finance, both for the current day and the previous ones.
Yahoo (just like Google) supports up to 15 days worth of data, using the following API query:
http://chartapi.finance.yahoo.com/instrument/1.0/AAPL/chartdata;type=quote;range=1d/csv
The thing is that data keeps on changing even when the markets are closed! Try refreshing every minute or so and some minute bars change, even from the beginning of the session.
Another interesting thing is that all of these queries return slightly different data for the same bars:
http://chartapi.finance.yahoo.com/instrument/2.0/AAPL/chartdata;type=quote;range=1d/csv
Replace the bold number with 100000 and it will still work but return slightly different data.
Does anyone understand this?
Is there a modern YQL query that can fetch historical minute data instead of this API?
Thanks!
Historical minute data is not as easily accessible as we all would like. I have found that the most affordable way to gather Intraday Stock Price data is to develop automated scripts that log price information for whenever the markets are open.
Similar to the Yahoo data URL that you shared, Bloomberg maintains 1-Day Intraday Price information in JSON format like this : https://www.bloomberg.com/markets/api/bulk-time-series/price/AAPL%3AUS?timeFrame=1_DAY
The URL convention appears easy to input on your own once you have a list of Ticker Symbols and an understanding of the consistent syntax.
To arrive at that URL initially though, without having any idea for guessing / reverse-engineering it, I simply went here https://www.bloomberg.com/quote/AAPL:US and used Developer Tools on my browser and tracked a background GET request which led me to that URL. I wouldn't be surprised if you could employ similar methods on other Price Data-related websites.
You can also write scripts to track price data as fast as your internet goes. One python package that I find pretty handy and is ystockquote
You can have it request price data every couple of seconds and log that into a daily time series database.
Yes there is other APIs.
I don't know if it can still help but if you need intraday data, there is a API on rapidapi called (Quotient) which allows to pull intraday (at 1-min level), EOD market (FX, Crypto, Stocks (US, CANADIAN, UK, AUSTRALIA, EUROPE), ETFs and Futures. It also provides earnings, dividends, splits and a lot others informations.

How long does an app in development get banned from Facebooks if it exceeds limits?

I have an app I'm developing against Facebook that timed out a few hours ago during my first production use. Of course I tried to get it do too much and the http call timed out. So, I rewrote what I was doing to use threaded connections, which sped up the interaction significantly! However, I was so engrossed in getting my interaction to speed up (it equated to about 25-50 calls, not exactly sure, I was expecting 25 but some of my results show it was 50 times), I didn't even stop to think about how fast I was hitting facebook.
So, I started getting the "Uncaught OAuthException: It looks like you were misusing this feature by going too fast. You窶况e been blocked from using it." which is what I now get even if I try to run my program with only 1 hit. I've added a sleep into my system to limit the hits at 1/second, but I'm concerned that my app (that was not making public posts so no one could have been bothered by them) is now forever banned from facebook, as it says I'm banned from the feature with a reference to learn about blocks in the Help Center; except I can't find any reference in the Help Center to my specific situation.
Does anyone know how long my app is out of commission?
And what are the specific (reference please, because I've search the hell out of fb and can't find one) limits regarding speed at which you can access facebook?
It depends on what has blocked you. In this case it was a spam bot that stopped me from posting comments into a group. Apparently there is a non-specific number of times you can post comments in a group in a short amount of time. The amount varies, but hovers around 150ish give or take 50 (at the time of my tests).
The ban appeared to be consistently set to about 19 hours at that time (May 2014). I've confirmed by continued testing in test groups and subsequent bans. However, Facebook developers are unable to give a solid set of numbers as they say it's controlled by a spam algorithm which changes based on server usage. So, 150 comments within about 3 minutes = ban for about 19 hours.

What is the upload limit on soundcloud

I sometimes get the error: { error_message: 'Sorry, you\'ve exceeded your upload limit.' } when I post sound files to soundcloud, using their http api.
I couldn't find any explanation for this 'upload limit' in their documentations.
Does anyone know if it's a daily limit? or a size limit? or a combination of both?
Thanks
Sparko is mostly right. The only difference is that you can tell how much remaining time you have by requesting the current user details (GET /me) and you'll there will be a key called upload_seconds_remaining.
Free users get 2 hours. Pro gets 4 hours. Pro Unlimited is unlimited. Regardless of the plan, individual tracks also can not be longer than ~6.5hrs (I forget the exact number)
Individual files cannot exceed 500mb Uploading Audio Files
However, I'd imagine this relates to your overall limit for uploading audio to SoundCloud based on the plan attached to the account you're posting to i.e exceeding the 2 hours provided by the free plan.
The API doesn't appear to provide a property for the remaining time provided to the user, although you could infer this from [user]plan & looping through all of their tracks and summing each [track]duration (although probably not advised).

The exact limitation of request from one IP

I'm developing the application which gets the top 20 of pages from all letters. Basically, at this time there's no problem with limitation. But I need to know what's the exact number of requests from one IP address per second ?
Best regards,
There is no exact number per second. Like any other site, if you do too many you will likely get blocked as a denial of service attack. If you are doing too many of an extended period of time, Facebook will likely block you, at least temporarily.
If you are trying to crawl Facebook, then you should obey the rules defined in their robots.txt file like any other crawler/spider should.
https://www.facebook.com/robots.txt
http://www.facebook.com/apps/site_scraping_tos_terms.php
That said, I've done around 15 million update requests per day back when they have profile boxes. Never had a problem.