APScheduler update database only once

APScheduler update database only once - postgresql

I'm trying to get APScheduler to update in my Flask app the Postgresql database every 5 minutes, but the database is only updated the first time, all subsequent times the changes are not saved. APScheduler itself works correctly, and if the function of updating the database is replaced with the function of displaying text, then everything works correctly every time.
In my app im using Flask-SQLAlchemy:
SQLALCHEMY_DATABASE_URI = 'postgresql+psycopg2://postgres:name#localhost/name'
The APScheduler code looks like this:
from apscheduler.schedulers.blocking import BlockingScheduler
sched = BlockingScheduler(daemon=True)
sched.add_job(func=update, trigger='interval', minutes=5)
sched.start()
The database update function looks like this:
def update():
for i in data:
for row in Names.query:
if row.id == i['id']:
row.name = i['name']
row.gender = i['gender']
row.age = i['age']
db.session.commit()
In the logs, APScheduler always works successfully. I also looked at the Postgresql logs, where I found this phrase: 'An existing connection was forcibly closed by the remote host.'
I suspect it might be the database engine and sessions, but I haven't found the instructions I need to implement within the Flask-SQLAlchemy package.
Versions of pacages:
Flask-SQLAlchemy==2.4.1
SQLAlchemy==1.3.17
APScheduler==3.6.3
db Model:
class Names(db.Model):
__searchable__ = ['name', 'age']
id = db.Column(db.Integer, primary_key=True)
name = db.Column(db.String(40))
gender = db.Column(db.String(40))
age = db.Column(db.Numeric)
def __repr__(self):
return '<Names %r>' % self.id

I think I figured out what the problem is. APScheduler somehow caches the contents of a variable the first time it is used and then only uses that value.
Before the function, I have the following code:
request = requests.get('https://privateapi')
data = request.json()
Then the function takes data from data:
def update():
for i in data:
for row in Names.query:
if row.id == i['id']:
row.name = i['name']
row.gender = i['gender']
row.age = i['age']
db.session.commit()
According to the Flask-SQLAlchemy logs, the data is written to the database successfully. I tried adding to the print (data) function so that every 5 minutes it would show me the contents of the data variable and I saw that its contents were not updated.
It turns out that the data is written to the database, but with the same values, so I don't see its update.
Then I tried shortening the request path and not saving its content to a variable:
def update():
for i in requests.get('https://privateapi').json():
for row in Names.query:
if row.id == i['id']:
row.name = i['name']
row.gender = i['gender']
row.age = i['age']
db.session.commit()
But here again nothing has changed.
UPDATE:
I solved this problem by removing the data variable at the end of the function:
def update():
name = requests.get('https://privateapi').json()
for i in name:
for row in Names.query:
if row.id == i['id']:
row.name = i['name']
row.gender = i['gender']
row.age = i['age']
del name
db.session.commit()

Related

Checking to see if record exists in MongoDB before Scrapy inserts

As the title implies, I'm running a Scrapy spider and storing results in MongoDB. Everything is running smoothly, except when I re-run the spider, it adds everything again, and I don't want the duplicates. My pipelines.py file looks like this:
import logging
import pymongo
from pymongo import MongoClient
from scrapy.conf import settings
from scrapy import log
class MongoPipeline(object):
collection_name = 'openings'
def __init__(self, mongo_uri, mongo_db):
self.mongo_uri = mongo_uri
self.mongo_db = mongo_db
#classmethod
def from_crawler(cls, crawler):
## pull in information from settings.py
return cls(
mongo_uri=crawler.settings.get('MONGO_URI'),
mongo_db=crawler.settings.get('MONGO_DATABASE')
)
def open_spider(self, spider):
## initializing spider
## opening db connection
self.client = pymongo.MongoClient(self.mongo_uri)
self.db = self.client[self.mongo_db]
def close_spider(self, spider):
## clean up when spider is closed
self.client.close()
def process_item(self, item, spider):
## how to handle each post
if self.db.openings.find({' quote_text': item['quote_text']}) == True:
pass
else:
self.db[self.collection_name].insert(dict(item))
logging.debug("Post added to MongoDB")
return item
My spider looks like this:
import scrapy
from ..items import QuotesItem
class QuoteSpider(scrapy.Spider):
name = 'quote'
allowed_domains = ['quotes.toscrape.com']
start_urls = ['http://quotes.toscrape.com/']
def parse(self, response):
items = QuotesItem()
quotes = response.xpath('//*[#class="quote"]')
for quote in quotes:
author = quote.xpath('.//*[#class="author"]//text()').extract_first()
quote_text = quote.xpath('.//span[#class="text"]//text()').extract_first()
items['author'] = author
items['quote_text'] = quote_text
yield items
The current syntax is obviously wrong, but is there a slight fix to the for loop to make to fix it? Should I be running this loop in the spider instead? I was also looking at upsert but was having trouble understanding how to use that effectively. Any help would be great.

Looks like you have a leading space here: self.db.openings.find({' quote_text': item['quote_text']}). I suppose it should just be 'quote_text'?
You should use is True instead of == True. This is the reason it adds everything again.
I would suggest to use findOne instead of find, will be more efficient.
Using upsert instead is indeed a good idea but the logic will be slightly different: you will update the data if the item already exists, and insert it when it doesn't exists (instead of not doing anything if the item already exists). The syntax should look something like this: self.db[self.collection_name].update({'quote_text': quote_text}, dict(item),upsert=True)

steps :
check if the collection is empty else : write in collection
if not empty and item exist : pass
else (collection not empty + item dosen't exist) : write in collection
code:
def process_item(self, item, spider):
## how to handle each post
# empty
if len(list(self.db[self.collection_name].find({}))) == 0 :
self.db[self.collection_name].insert_one(dict(item))
# not empty
elif item in list(self.db[self.collection_name].find(item,{"_id":0})) :
print("item exist")
pass
else:
print("new item")
#print("here is item",item)
self.db[self.collection_name].insert_one(dict(item))
logging.debug("Post added to MongoDB")
return item

need help in understanding if the way I am testing a function is correct

I have written this function which is called when a user clicks a link. The function basically creates a copy of the user data with one field altered (thus keeping the original value unchanged i.e. not-mutable) and then updates the database with the new value
def confirmSignupforUser(user:User):Future[Option[User]] = {
println("confirming user: "+user)
val newInternalProfile = user.profile.internalProfileDetails.get.copy(confirmed=true)//new data which should be added in the database
println("old internal profile: "+user.profile.internalProfileDetails.get)
println("new internal profile: "+newInternalProfile)
val newProfile = UserProfile(Some(newInternalProfile),user.profile.externalProfileDetails)
println("old profile: "+user.profile)
println("new profile: "+newProfile)
val confirmedUser = user.copy(profile=newProfile)
for(userOption <- userRepo.update(confirmedUser)) yield { //database operation
println("returning modified user:"+userOption)
userOption
}
}
To test the code, I have written the following spec
"confirmSignupforUser" should {
"change confirmed status to True" in {
val testEnv = new TestEnv(components.configuration)
val externalProfile = testEnv.externalUserProfile
val internalUnconfirmedProfile = InternalUserProfile(testEnv.loginInfo,1,false,None)
val internalConfirmedProfile = internalUnconfirmedProfile.copy(confirmed=true)
val unconfirmedProfile = UserProfile(Some(internalUnconfirmedProfile),externalProfile)
val confirmedProfile = UserProfile(Some(internalConfirmedProfile),externalProfile)
val origUser = User(testEnv.mockHelperMethods.getUniqueID(),unconfirmedProfile)
val confirmedUser = origUser.copy(profile = confirmedProfile)
//the argument passed to update is part of test. The function confirmSignupforUser should pass a confirmed profile
when(testEnv.mockUserRepository.update(confirmedUser)).thenReturn(Future{Some(confirmedUser)})
//// await is from play.api.test.FutureAwaits
val updatedUserOption:Option[User] = await[Option[User]](testEnv.controller.confirmSignupforUser(origUser))
println(s"received updated user option ${updatedUserOption}")
updatedUserOption mustBe Some(confirmedUser)
}
}
I am not confident if I am testing the method correctly. The only way I can check that the confirmed field got changed is by looking at the return value of confirmSignupforUser. But I am actually mocking the value and I have already set the field confirmed to true in the mocked value (when(testEnv.mockUserRepository.update(confirmedUser)).thenReturn(Future{Some(confirmedUser)}).
I know the code works because in the above mock, the update method expects confirmedUser or in other words, a user with confirmed field set to true. So if my code wasn't working, update would have been called with user whose confirmed field was false and mockito would have failed.
Is this the right way to test the method or is there a better way?

You don't need to intialize internalConfirmedProfile in your test. The whole point is to start with confirmed=false, run the confirmSignupforUser method, and make sure that the output is confirmed=true.
You should check 2 things:
check that the return value has confirmed=true (which you do)
check that the repository has that user saved with confirmed=true (which you don't check). To check that you would need to load the user back from the repository at the end.

Querying average with a function of column in Rails

I'm using Rails 4 in a web app, Postgresql database and squeel gem for queries.
I have this function in my model statistic.rb
def properties_mean_ppm(mode, rooms, type, output_currency_id)
sql_result = properties(mode, rooms, type).select{
avg(price_dolar / property_area).as(prom)
}
avg = sql_result[0].prom
final_avg = change_currency(avg, DOLAR_ID, output_currency_id)
return final_avg.to_f
end
price_dolar and property_area are columns in the properties table.
It works fine in Rails console and displays the result, but when I use it on the controller it gives an error:
ActiveModel::MissingAttributeError (missing attribute: id)
And indicates the line
avg = sql_result.to_a[0].prom
I also tried using sql_result[0].prom or sql_result.take or sql_result.first, they all have the same error.
The sql_result is this:
#<ActiveRecord::Relation [#<Property >]>
This is the action called in the controller
def properties_mean_ppm
#statistic = Statistic.find(params[:id])
mode = params[:mode] ? params[:mode] : ANY_MODE
type = params[:type] ? params[:type] : ANY_TYPE
one_room = #statistic.properties_mean_ppm(mode, 1, type, UF)
end
I know how to get the result using only SQL without activerecord but that would be very inefficient for me because I have lots of filters called before in the properties() function

Seems like calling that from the properties object made the controller to expect a Property with an id as a result. So I made it work without squeel.
sql_result = properties(mode, rooms, type).select(
"avg(precio_dolar / dimension_propiedad) as prom, 1 as id"
)
And giving a fixed id to the result

SQLALchemy "after_insert" doesn't update target object fields

I have a model (see code below) on which I want to execute a function after an object is inserted that will update one of the object's fields. I'm using the after_insert Mapper Event to do this.
I've confirmed that the after_insert properly calls the event_extract_audio_text() handler, and the target is getting updated with the correct audio_text value. However, once the event handler finishes executing, the text value is not set for the object in the database.
Code
# Event handler
def event_extract_audio_text(mapper, connect, target):
# Extract text from audio file
audio_text = compute_text_from_audio_file(target.filename)
# Update the 'text' field with extracted text
target.audio_text = audio_text
# Model
class SoundsRaw(db.Model):
__tablename__ = 'soundsraw'
id = db.Column(db.BigInteger(), primary_key=True, autoincrement=True)
filename = db.Column(db.String(255))
audio_text = db.Column(db.Text())
# Event listener
event.listen(SoundsRaw, 'after_insert', event_extract_audio_text)
I've also tried calling db.session.commit() to try to update the object with the text value, but then I get the following stack trace:
File "/Users/alexmarse/.virtualenvs/techmuseum/lib/python2.7/site-packages/sqlalchemy/orm/session.py", line 219, in _assert_active
raise sa_exc.ResourceClosedError(closed_msg)
ResourceClosedError: This transaction is closed
Any ideas?
Software versions
SQLAlchemy 0.9.4
Flask 0.10.1
Flask-SQLAlchemy 1.0

The thing with 'after_insert' kind of handlers is to use the connection directly. Here's how I did it:
class Link(db.Model):
"News link data."
__tablename__ = 'news_links'
id = db.Column(db.BigInteger, primary_key=True)
slug = db.Column(db.String, unique=True) #, nullable=False
url = db.Column(db.String, nullable=False, unique=True)
title = db.Column(db.String)
image_url = db.Column(db.String)
description = db.Column(db.String)
#db.event.listens_for(Link, "after_insert")
def after_insert(mapper, connection, target):
link_table = Link.__table__
if target.slug is None:
connection.execute(
link_table.update().
where(link_table.c.id==target.id).
values(slug=slugify(target.id))
)

I ended up solving this by ditching the Mapper Event approach and using Flask's Signalling Support instead.
Basically, you can register "signals" on your model, which are essentially callback functions that are called whenever a specific kind of event happens. In my case, the event is an "update" on my model.
To configure the signals, I added this method to my app.py file:
def on_models_committed(sender, changes):
"""Handler for model change signals"""
for model, change in changes:
if change == 'insert' and hasattr(model, '__commit_insert__'):
model.__commit_insert__()
if change == 'update' and hasattr(model, '__commit_update__'):
model.__commit_update__()
if change == 'delete' and hasattr(model, '__commit_delete__'):
model.__commit_delete__()
Then, on my model, I added this function to handle the update event:
# Event methods
def __commit_update__(self):
# create a new db session, which avoids the ResourceClosedError
session = create_db_session()
from techmuseum.modules.sensors.models import SoundsRaw
# Get the SoundsRaw record by uuid (self contains the object being updated,
# but we can't just update/commit self -- we'd get a ResourceClosedError)
sound = session.query(SoundsRaw).filter_by(uuid=self.uuid).first()
# Extract text from audio file
audio_text = compute_text_from_audio_file(sound)
# Update the 'text' field of the sound
sound.text = audio_text
# Commit the update to the sound
session.add(sound)
session.commit()
def create_db_session():
# create a new Session
from sqlalchemy import create_engine
from sqlalchemy.orm import sessionmaker
psql_url = app.config['SQLALCHEMY_DATABASE_URI']
some_engine = create_engine(psql_url)
# create a configured "Session" class
session = sessionmaker(bind=some_engine)
return session

How to obtain the image path/ url from a file uploaded on an openshift data dir to be stored in a postgresql database?

I am a self-taught programming newbie here so please bear with me. I am trying to create a site which allows users to upload their images on. With the patience of another user, I was able to get some answers as to how to create and allow users to upload their images onto the data drive on openshift. However, now I need to be able to store the image path or url onto a postgresql database (which can be called on later) so that each user will be able to keep track of the images that they have uploaded. I am currently stymied by this.
Here are the fragments of code which I feel plays a big role in answering this question:
class Todo(db.Model):
__tablename__ = 'todos'
id = db.Column('todo_id', db.Integer, primary_key=True)
title = db.Column(db.String(60))
text = db.Column(db.String)
done = db.Column(db.Boolean)
pub_date = db.Column(db.DateTime)
user_id = db.Column(db.Integer, db.ForeignKey('users.user_id'))
image_url = db.Column(db.String)
def __init__(self, title, text, image_url):
self.title = title
self.text = text
self.image_url = image_url
self.done = False
self.pub_date = datetime.utcnow()
def allowed_file(filename):
return '.' in filename and \
filename.rsplit('.', 1)[1] in app.config['ALLOWED_EXTENSIONS']
#app.route('/upload', methods=['POST'])
def upload():
# Get the name of the uploaded file
file = request.files['file']
# Check if the file is one of the allowed types/extensions
if file and allowed_file(file.filename):
# Make the filename safe, remove unsupported chars
filename = secure_filename(file.filename)
# Move the file form the temporal folder to
# the upload folder we setup
file.save(os.path.join(app.config['UPLOAD_FOLDER'], filename))
# Redirect the user to the uploaded_file route, which
# will basicaly show on the browser the uploaded file
return redirect(url_for('uploaded_file',
filename=filename))
#app.route('/uploads/<filename>')
def uploaded_file(filename):
return send_from_directory(app.config['UPLOAD_FOLDER'],
filename)
#app.route('/new', methods=['GET', 'POST'])
#login_required
def new():
if request.method == 'POST':
if not request.form['title']:
flash('Title is required', 'error')
elif not request.form['text']:
flash('Text is required', 'error')
else:
todo = Todo(request.form['title'], request.form['text'])
todo.user = g.user
db.session.add(todo)
db.session.commit()
flash('Todo item was successfully created')
return redirect(url_for('index'))
return render_template('new.html')
The current code that I have is pieced together by various tutorials and examples.
Currently, I am trying to merge the "Todo" db, "upload" function and "new" function and am having very little success. Using the little knowledge I have, I have merely added the "image_url" portions in which will be a column that is intended to house the image path. I would greatly appreciate it if somebody could shed some light on this conundrum. Thanks a million.
Respectfully,
Max