I am trying to implement a facebook scraper, to get insights about the reactions on feed posts of facebook-pages. I've noticed that the results (posts) of the actual day and last days are right, but the further it goes in the past, the more feed posts get skipped, and the count of the returned results is very low.
Why is Graph skipping many posts? Sometimes it skips even complete months!
Here is the code I'm using:
import json
import datetime
import csv
import time
import urllib.request
import urllib.error
import requests
import numpy as np
import matplotlib.pyplot as plt
import json
from urllib.parse import urlencode
import pandas as pd
page_id="nytimes"
token="my_User_Token_Here" #using a user token got from [https://developers.facebook.com/tools/explorer/][1]
url="https://graph.facebook.com/v2.12/"+page_id+"/posts/?fields=id,created_time,message,shares.summary(true).limit(0),comments.summary(true).limit(0),likes.summary(true),reactions.type(LOVE).limit(0).summary(total_count).as(Love),reactions.type(WOW).limit(0).summary(total_count).as(Wow),reactions.type(HAHA).limit(0).summary(total_count).as(Haha),reactions.type(SAD).limit(0).summary(1).as(Sad),reactions.type(ANGRY).limit(0).summary(1).as(Angry)&access_token="+token+"&limit=100"
posts = []
found = False
try:
while (True):
print(url)
facebook_connection = urlopen(url)
data = facebook_connection.read().decode('utf8')
json_object = json.loads(data)
allposts=json_object["data"]
allposts = np.asarray(allposts)
created = '2018-03-01'
for i in range(0,100,1):
if (pd.to_datetime(allposts[i]['created_time']) > pd.to_datetime(created)):
#print(allposts[i]['created_time'])
posts.append(allposts[i])
else:
print(i, "%i fucking here!")
posts.append(allposts[i])
found = True
break;
if (i == 99):
#print('here is: ' + i)
url = json_object["paging"]["next"]
if (found == True):
break;
df=pd.DataFrame(posts)
except Exception as ex:
print (ex)
This is a reported bug. Since it was reported, the rules have changed with API v2.12 and only the top 600 posts per year can be reached. This is obviously bad news for developers and researchers.
Related
I tried the facebook/blenderbot-3B model using the Hosted Inference API and it works pretty well (https://huggingface.co/facebook/blenderbot-3B). Now I tried to use it locally with the Python script shown below. The created responses are much worse than from the inference API and do not make sense most of the time.
Is a different code used for the inference API or did I make a mistake?
from transformers import TFAutoModelForCausalLM, AutoTokenizer, BlenderbotTokenizer, TFBlenderbotForConditionalGeneration, TFT5ForConditionalGeneration, BlenderbotTokenizer, BlenderbotForConditionalGeneration
import tensorflow as tf
import torch
device = "cuda:0" if torch.cuda.is_available() else "cpu"
chat_bots = {
'BlenderBot': [BlenderbotTokenizer.from_pretrained("hyunwoongko/blenderbot-9B"), BlenderbotForConditionalGeneration.from_pretrained("hyunwoongko/blenderbot-9B").to(device)],
}
key = 'BlenderBot'
tokenizer, model = chat_bots[key]
for step in range(100):
new_user_input_ids = tokenizer.encode(input(">> User:") + tokenizer.eos_token, return_tensors='pt').to(device)
if step > 0:
bot_input_ids = torch.cat([chat_history_ids, new_user_input_ids], dim=-1)
else:
bot_input_ids = new_user_input_ids
chat_history_ids = model.generate(bot_input_ids, max_length=1000, pad_token_id=tokenizer.eos_token_id).to(device)
print("Bot: ", tokenizer.batch_decode(chat_history_ids, skip_special_tokens=True)[0])
So i had a much bigger problem ,but i sorted that out. Now, i have this error command:
pygame 2.0.1 (SDL 2.0.14, Python 3.7.9)
Hello from the pygame community. https://www.pygame.org/contribute.html
Traceback (most recent call last):
File "c:/Users/danku/jarvis.py", line 16, in
engine = pyttsx3.init('sapi5')
AttributeError: module 'pyttsx3' has no attribute 'init'
i tried kinda everything(installed pygame,pypiwin32,pywintypes) but i cant figure it out. Here is my beloved code (dont laugh its jarvis code):
#alap
import pyttsx3
import datetime
import speech_recognition as sr
import wikipedia
import webbrowser
import os
import pywhatkit
import pyjokes
import subprocess
import pywintypes
import win32com.client
import pygame
engine = pyttsx3.init('sapi5')
def speak(audio):
engine.say(audio)
engine.runAndWait()
def time():
Time = datetime.datetime.now().strftime("%H:%M:%S")
speak(Time)
def date():
year = int(datetime.datetime.now().year)
month = int(datetime.datetime.now().month)
date = int(datetime.datetime.now().day)
speak(date)
speak(month)
speak(year)
def wishme():
speak("Welcome back sir! All system are ready for work!")
speak("the current time is")
time()
speak("The current date is")
date()
hour = datetime.datetime.now().hour
if hour >= 6 and hour<12:
speak("Good morning sir!")
elif hour >=12 and hour<18:
speak("Good afternoon sir!")
elif hour >=18 and hour<24:
speak("Good evening sir!")
else:
speak("Good night sir!")
speak("Jarvis at your service. Please tell me how can i help you?")
def takeCommand():
r = sr.Recognizer()
with sr.Microphone() as source:
print("listening...")
r.pause_threshold = 1
audio = r.listen(source)
try:
print("Recognizing...")
query = r.recognize_google(audio, language='en-US')
print(query)
except Exception as e:
print(e)
speak("Say that again")
return "none"
return query
if __name__ == "__main__":
wishme()
while True:
query = takeCommand().lower()
if 'wikipedia' in query: #if wikipedia found in the query then this block will be executed
speak('Searching Wikipedia...')
query = query.replace("wikipedia", "")
results = wikipedia.summary(query, sentences=2)
speak("According to Wikipedia")
print(results)
speak(results)
``
Also i'm using python 2.71, and latest of pip.
It is normally because you have named your Python file the same as the module you are importing and caused a circular reference. Try changing the name of your file. It should resolve the issue.
I cannot get any NSE Symbol data from the AlphaVantage, they return always empty array.
Query:
https://www.alphavantage.co/query?function=TIME_SERIES_INTRADAY&symbol=NSE:TITAN&apikey=Q134IXR7RVWU5AQL&outputsize=full&interval=1min
Response:
{}
A month back the same query was returning data.
Looks something has changed on the AlphaVantage server end recently.
Your help is much appreciated in advance!
Try this
import pandas as pd
import json
import requests
import datetime
from pandas import DataFrame
from datetime import datetime as dt
from alpha_vantage.timeseries import TimeSeries
stock_ticker = 'SPY'
api_key = 'Q134IXR7RVWU5AQL'
ts = TimeSeries (key=api_key, output_format = "pandas")
data_daily, meta_data = ts.get_intraday(symbol=stock_ticker, interval ='1min', outputsize ='full')
print(data_daily)
The output for me looks like this
I am using Gatling 3. I have a csv feeder with just one column titled accountIds. I need to pass this in the body of my POST request as JSON. I have tried a lot of different syntax but nothing seems to be working. I can also not print what is actually being sent in the body. It works if I remove the "$accountIds" and use an actual value instead. Below is my code:
val searchFeeder = csv("C://data/accountids.csv").random
val scn1 = scenario("Scenario 1")
.feed(searchFeeder)
.exec(http("Search")
.post("/v3/accounts/")
.body(StringBody("""{"accountIds": "${accountIds}"}""")).asJson)
setUp(scn1.inject(atOnceUsers(10)).protocols(httpConf))
Have you enabled trace level in logback.xml to see the details of post request?
Also, can you confirm if location you have mentioned "C://data/accountids.csv" is the right one. Generally, data folder resides in project location and within project you can access the data file as:
val searchFeeder = csv("data/stack.csv").random
I just created sample script and enabled logging.I am able to see that accountId is getting passed:
package basicpackage
import io.gatling.core.Predef._
import io.gatling.http.Predef._
import io.gatling.core.scenario.Simulation
class StackFeeder extends Simulation {
val httpConf=http.baseUrl("http://example.com")
val searchFeeder = csv("data/stack.csv").random
val scn1 = scenario("Scenario 1")
.feed(searchFeeder)
.exec(http("Search")
.post("/v3/accounts/")
.body(StringBody("""{"accountIds": "${accountIds}"}""")).asJson)
setUp(scn1.inject(atOnceUsers(1)).protocols(httpConf))
csv file location
I've been trying to scrape different pages. First, I scrape a URL from the first page using the xpath(#href) at the parse function. And then I try to scrape the article at that URL, from the parse function request callback. But it doesn't work.
How can I solve this issue? Here is my code:
import scrapy
from string import join
from article.items import ArticleItem
class ArticleSpider(scrapy.Spider):
name = "article"
allowed_domains = ["http://joongang.joins.com"]
j_classifications = ['politics','money','society','culture']
start_urls = ["http://news.joins.com/politics",
"http://news.joins.com/society",
"http://news.joins.com/money",]
def parse(self, response):
sel = scrapy.Selector(response)
urls = sel.xpath('//div[#class="bd"]/ul/li/strong[#class="headline mg"]')
items = []
for url in urls:
item = ArticleItem()
item['url'] = url.xpath('a/#href').extract()
item['url'] = "http://news.joins.com"+join(item['url'])
items.append(item['url'])
for itm in items:
yield scrapy.Request(itm,callback=self.parse2,meta={'item':item})
def parse2(self, response):
item = response.meta['item']
sel = scrapy.Selector(response)
articles = sel.xpath('//div[#id="article_body"]')
for article in articles:
item['article'] = article.xpath('text()').extract()
items.append(item['article'])
return items
The problem here is that you restrict the domains: allowed_domains = ["http://joongang.joins.com"]
If I change this to allowed_domains = ["joins.com"] I get results in parse2 and article text is extracted -- as unicode but this is OK since the site is not written in latin characters.
And by the way: you can use response.xpath() instead of creating a selector over the response object. This requires some less code and makes it easier to code.