I need to periodically import some data into my rails app on Heroku.
The task to execute is split into the following parts:
* download a big zip file (e.g. ~100mb) from a website
* unzip the file (unzipped space is ~1.50gb)
* run a rake script that reads those file and create or update records using my active record models
* cleanup
How can I do this on heroku? Is it better to use some external storage (e.g. S3).
How would you approach such a thing?
Ideally this needs to run every night.
I have tried exact same thing couple of days back and the conclusion that I came up with was this can't be done because of memory limit restrictions that heroku imposes on each process. (I build a data structure with the files that I read from the internet and try to push to DB)
I was using a rake task that would pull and parse couple of big file and then populate the database.
As a work around I run this rake task in my local machine now and push the database to S3 and issue a heroku command from my local machine to restore the heroku DB instance.
"heroku pgbackups:restore 'http://s3.amazonaws.com/#{yourfilepath}' --app #{APP_NAME} --confirm #{APP_NAME}"
You could push to S3 using fog library
require 'rubygems'
require 'fog'
connection = Fog::Storage.new(
:provider => 'AWS',
:aws_secret_access_key => "#{YOUR_SECRECT}",
:aws_access_key_id => "#{YOUR_ACCESS_KEY}"
)
directory = connection.directories.get("#{YOUR_BACKUP_DIRECTORY}")
# upload the file
file = directory.files.create(
:key => '#{REMOTE_FILE_NAME}',
:body => File.open("#{LOCAL_BACKUP_FILE_PATH}"),
:public => true
)
The command that I use to make a pgbackup on my local machine is
system "PGPASSWORD=#{YOUR_DB_PASSWORD} pg_dump -Fc --no-acl --no-owner -h localhost -U #{YOUR_DB_USER_NAME} #{YOUR_DB_DATABSE_NAME} > #{LOCAL_BACKUP_FILE_PATH}"
I have put a rake task that automates all these steps.
After thing your might try is use worker(DelayedJob). I guess you can configure your workers to run every 24 hours. I think workers don't have the restriction of 30 seconds limit. But I am not sure about the memory usage.
Related
I have my Gitlab CI running with Symfony, I have fixtures loaded and I want to load them in a buffer dabatase, and then to move them to the real database.
I've seen this thread: Docker - How can run the psql command in the postgres container?, but I would like to have an automatic script which:
delete my real database
rename my buffer database to the real database's name
Is it possible using Docker & Gitlab CI to automate such commands? I am using pg_dump for now, but it's long an not easy to use, I just want to replace a DB with an other DB.
I’ve currently been using these two commands to copy the db from production to staging:
URL=$(heroku config:get DATABASE_URL -a myapp-production)
heroku pg:copy "$URL" DATABASE_URL -a myapp-staging --confirm myapp-staging
Due to a swelling database I wonder for testing purposes how to replicate only a limited number of rows (of each table) to another database.
A naive solution could be:
1. Download db production -> locally
2. Run script that wipes out data
3. 4 Upload db locally -> staging
This question answers how to download an entire table of a database.
I'm trying to do a simple postgres backup with crontab. Here is the command I use:
# m h dom mon dow user command
49 13 * * * postgres /usr/bin/pg_dump store | bzip2 > /home/backups/postgres/$(date +"\%Y-\%m-\%d")_store.sq.bz2
A backup file is created but it is very small (looks like 14 bytes).
I can run this command just fine in the terminal (with a filesize that matches my db).
The log files don't mention any errors (grep CRON /var/log/syslog). Any idea what might be off?
The key to the solution of this, is to realize is that running the 'same' command in bash and running via Cron isn't the same thing!
For e.g. when run via Cron obvious defaults (.bash_profile / .pgpass / default binary paths) aren't the same and thus what works on bash may not work in Cron.
For a checklist:
Ensure Bzip2 is replaced with the complete path (for e.g. /usr/bin/bzip2 on CentOS / RHEL)
Ensure that 'Store' Database is readable by the Cron command (for e.g. adding -U Postgres would be a good addition). If the DB login is dependent on .pgpass file, it wouldn't work with cron. In such scenarios, you'd need to ensure pg_hba.conf is configured for this purpose (for e.g. you could allow 'trust' authentication for a specific-known DB / machine / user combination etc.
Ensure that /home/backups/postgres/... is accessible for writing, for obvious reasons.
We're having trouble with the heroku fork command so are manually trying to create a staging environment. I tried creating a new database off of a backup from our prod db but the created db has no rows and is only 6.4MB. The actual backup is 15.7 GB.
I did this via the web console clicking "restore".
Whats the right way to do this?
From the command line, you want to do:
heroku pgbackups:restore DATABASE -a example-staging `heroku pgbackups:url -a example`
We use this command every few days, whenever we want the staging database to be replaced with the production database. This comes from: https://devcenter.heroku.com/articles/migrate-heroku-postgres-with-pgbackups#transfer-to-target-database
hi I take backup of my heroku database using PGBackups but its provide me backup on there pre define places .now I wont to take that backup on my remote location folder on S3 or some other remote storage.
how can we do that periodically like every week or month it's own/automatically.
my app on Ruby on Rails help me to achieve this.
The latest backup is always available to you by entering heroku pgbackups:url. So, set up a cron job or the equivalent that fetches that URL once a week or once a day.
You could write a rake or Ruby script and call it with Heroku Scheduler (a free addon), or use a different remote machine to pull the backup, or do a shell script and:
curl -O `heroku pgbackups:url`
Here is a Gem that appears to do what you want: https://coderwall.com/p/w4wpvw