This might be a simple question, but I would like some clarification.
Based off the docs, Heroku has an ephemeral file system. How I interpret it is that anytime you upload a file to Heroku and there is a change in the configuration or the app is restarted, the files are gone.
However, I was wondering if this is the case if you upload data to Heroku Postgres through a dumps file.
For development, I am using a local Postgres server. From there, I would create a dumps file and then upload that file using commands found here:
https://stackoverflow.com/a/71206831/3100570
Now suppose my application makes a POST request to Heroku Postgres, would that data be persisted along with the initial data from the dumps file in the event that the application is restarted or crashed?
Ingesting data into your PostgreSQL database this way doesn't touch your dyno's filesystem. You are simply connecting to PostgreSQL and running the SQL commands contained in that file:
-f, --file=file
SQL file to run
The data will be stored in PostgreSQL in exactly the same way it would if you did a bunch of INSERTs yourself. You should have no problem ingesting data this way and then continuing to interact with your application as normal.
Related
I'm working on trying to setup my local database with some mock data to work with. We have a development AWS account with a postgres database. I would like to create a backup of it, export it to my local computer, and restore to my local postgres database.
I've been trying to find how to do this online, but everything I'm finding is on how to backup to AWS and to restore back to AWS. I tried creating a snapshot and exporting it via S3 - but the snapshot doesn't produce a sql file to restore from like I was expecting.
If anyone can point me in the right direction I would very much appreciate it :)
I am afraid that the only chance you have is pg_dump/pg_restore.
Even if Amazon lets you get your hands on its file system backups, which I doubt, they may be of little use to you, since Amazon runs modified versions of PostgreSQL and you cannot be sure that the physical file format is identical to PostgreSQL.
I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...
I have deployed my SpringBoot app to Heroku. Now I would like to copy my local PostgreSQL to Heroku.
I have found some information on devcenter.heroku.com.
However I don't understand enough about the using of file db.changelog-master.yaml.
Could anyone give me details about the simplest solutions to copy the database?
Create a valid dump of your local postgres database and host it somewhere publicly available. Now you will be able to restore this entire dataset (schema and records) with pg:backups:restore as shown here. The sole caveat here is that the target database must be completely empty for this to work. You can empty a Heroku postgres database with heroku pg:reset.
If you cannot take the approach listed above then you can run pg_restore directly from your local instance, provided your local version of Postgres is >= the target version of Postgres. This also applies to creating the dumpfile and is a requirement because pg utilities are not guaranteed to be forward compatible. Documentation for pg_restore is here.
I want to have my postgresql database on disk (Ubuntu 14.04); how can I set this up and access the data? I have to use ftp to get files; I don't know where to search.
If you want a complete backup of the database via FTP, you'll have to follow the documentation. Basically, first you call pg_start_backup, then you get the data directory, then you call pg_stop_backup, then you get all WAL archives.
You have to run recovery on the backup before you can access it.
we are using Heroku Postgres with Ruby on Rails 3.2.
A few days before, we deleted important data by mistake using 'heroku run db:load' with misconfigured data.yml, that is, drop tables and the recreate tables with almost no data.
Backup data is only available 2 weeeks before, so we lost data of 2 weeks.
So We need to recover not by PG Backup/pg_dump but by postgresql's system data files.
I think, the only way to recover data is to restore data from xlog or archive file, but of course we don't have permission to be Super User/Replication Role to copy postgres database on heroku (or Amazon EC2) to local server.
Is there anyone who confronted such a case and resolved the problem?
Your only option is the backups provided by the PgBackups service (if you had that running). If not, Heroku support might have more options available.
At a minimum, you will have some data loss, but you can guarantee you won't do it again ;)