Choosing the correct Amazon Machine Image (AMI) for bash script upload to Postgres - postgresql

I have a bash script written for OSX that downloads many large .zip files, unpacks them, and writes the contents to a Postgres database.
I want to do this from an EC2 instance because the operation takes a long time.
I don't which AMI to choose, given that OSX is not an option.
Should I be doing this on Ubuntu?

You don't need to use an AMI you can use AWS cloud init for launching your bash script after your instance is launched and as far as operating is concerned i think any operating system is capable for doing things you mentioned in the question however, i would recommend to use Amazon linux because its more optimized for ec2

Related

I have a question about postgreSQL cluster

I am new to programming. I used PostgreSQL because I needed a database program.
I made a data file using initdb.exe
initdb.exe -U postgres -A password -E utf8 -W -D D:\Develop\postgresql-10.17-2-windows-x64-binaries\data
This method is called a data cluster.
I have put a lot of information into this data file.
Now I want to transfer the data to another computer and use it.
How do I import and use files created using a cluster?
I want to register and use it in pgAdmin4.
What should I do?
I am using a Windows 10 operating system. A solution similar to loading a cluster is required.
As long as you want to transfer to another 64-bit Windows system, you can just shut down the server and copy the data directory. Register a service with pg_ctl register if you want.
To copy the data to a different operating system, you have to use pg_dumpall and restore with psql. pgAdmin won't help you there (it is not an administration tool, as its name would suggest).

Storage Manager in pgAdmin

I am trying to backup one of my databases in PostgreSQL pgAdmin tool. I used this tutorial:
backup database with pgAdmin
After finishing that I want to have the file. In that tutorial it says that we can use the Storage Manager to download the backup file on the client machine. After that from this link I wanted to access the Storage Manager. It says that "You can access Storage Manager from the Tools Menu", but from my system there is not any option with that name:
What is the problem and how could I obtain the backup database file?
If you are not running pgAdmin4 in server mode, then there is no storage manager. The storage manager is only relevant when the computer from which you run the pgAdmin4 GUI is different from the computer where the pgAdmin4 app-server is running.
When you took the backup, you told it where to save the file although not in a very user-friendly way. It asks for a filename, and there are three dots you can click to browse for a directory into which to put the file. But if you don't avail yourself of the three dots, then you don't know where it is going to put the file, it just uses an apparently OS-dependent default and doesn't tell you what it is. I usually find in my "Documents" folder. (Well, I usually don't use pgAdmin4 in the first place as it makes everything harder than just using the command line is, but when I do use it...)

How to load data from S3 to PostgreSQL RDS

I have a need to load data from S3 to Postgres RDS (around 50-100 GB) I don't have the option to use AWS Data Pipeline and I am looking for something similar to using the COPY command to load data in S3 into Amazon Redshift.
I would appreciate any suggestions on how I can accomplish this.
Originally, this answer was trying to use the S3 to Postgres RDS Functionality. That whole enterprise failed (see below).
The way I have finally been able to do this is:
Set-up an EC2 instance with psql installed (see below near end of post)
Copy the relevant CSVs to import from S3 to the local instance
Use the psql /copy command to import the files up
This last part is really, really important. If you use the SQL COPY command the entire RDS Postgres role structure will frustrate you to no end. It has a wonky SUPERRDSADMIN role which is not very super at all. However, if you use the psql /copy commany you apparently can do anything. I have confirmed this be the case and have started my uploads succesfully. I will come back and re-edit this post (time permitting) to add relevant documentation steps for the above.
Caveat Emptor: The post below was all the original work I had done trying to get this implemented. I don't want to bury the lead despite multiple efforts (including what can only be described as pathetic tech support from AWS) I don't believe that this feature is ready for prime time. Despite a very simple test environment, easy to replicate, AWS has not provided an effective way to not get the copy statement to crap out as follows:
The actual call to aws_s3.table_import_from_s3(...) is reporting a permission problem between RDS and S3. From my research work with psql this appears to be a C library, probably installed by AWS.
NOTICE: CURL error code: 28 when attempting to validate pre-signed URL, 1 attempt(s) remaining
NOTICE: HINT: make sure your instance is able to connect with S3.
S3 to Postgres RDS Functionality Now Added
On 2019-04-24 AWS released functionality allowing a Postgres RDS to load directly from S3. You can read the announcement here, and see the documentation page here.
I am sharing with the OP because this appears to be the AWS supported way of solving the question posed.
Key summary points:
Requires Postgres 11.1 or greater
Need access to psql and the ability to connect it to the RDS instance
Need to install the aws_s3 extension which pulls in aws_commons.
You can get to the S3 bucket by specifying credentials or by assigning IAM roles to RDS
It advertises supporting all of the same data formats as the postgres COPY command
It currently only appears to support a single file at a time (ie no regex)
The instructions are fairly detailed and provide a variety of paths to configuring (AWS CLI scripts, Console instructions, etc). Additionally, the option to use your IAM keys rather than have to set-up roles is nice.
I did not find a way to download just psql, so I had to bring down a full postgres install down to my mac, but that was no big deal with brew:
brew install postgres
and since the DB service does not get activated it is the quickest way to get psql.
Update: Decided that having psql on my mac was a security hole, port forwarding, etc. I found that there is a simple Postgres install available for AMI Linux 2 under the AMI Extras rubric. The install command is fairly simple on your ami instance type.
sudo amazon-linux-extras install postgresql10
psql is fairly easy to use, however, important to keep in mind that any instructions to psql itself are escaped by a \. Documentation on psql can be found here. Recommend going through it at least once before executing the AWS recommended scripts.
To the extent you run tight security and have access to your RDS instances seriously restricted (which I do) don't forget to open up the ports from your AMI instance running Postgres to your RDS instance.
If your preference is a GUI then you can try to use PGAdmin4. It is the AWS recommended way of connecting to RDS Postgres instances according to the docs. I was unable to get any of the SSH tunneling features to work (which is why I ended up doing the localhost SSH mapping that I used for psql). I also found it to be rather buggy in other ways. Reading reviews of the product it seems that version 4 may not be the stablest of releases.
http://docs.aws.amazon.com/redshift/latest/dg/t_loading-tables-from-s3.html
Use the COPY command to load a table in parallel from data files on
Amazon S3. You can specify the files to be loaded by using an Amazon
S3 object prefix or by using a manifest file.
The syntax to specify the files to be loaded by using a prefix is as
follows:
copy <table_name> from 's3://<bucket_name>/<object_prefix>'
authorization;
update
Another option is to mount s3 and use direct path to the csv with COPY command. I'm not sure If it will hold 100GB effectively, but worth of trying. Here is some list of options on software.
Yet another option would be "parsing" s3 file part by part with something described here to a file and COPY from named pipe, described here
And the most obvious option to just download file to local storage and use COPY I don't cover at all
Also worth of mentioning would be s3_fdw (status unstable). Readme is very laconic, but I assume you could create a foreign table leading to s3 file. Which itself means you can load data to other relation...

Postgresql cluster initialization

SQL distributes pre-initialized catalog cluster but for postgresql we need initialize cluster using initdb and a network service account. It fails in few cases and causing bit of misery!
Can initialize cluster ourselves and distribute pre-initialized cluster?
Thanks
The "cluster" (or data directory) depends on the operating system and the architecture. So a data directory that was initialized with initdb on a 32bit Linux will not work on a 64bit Windows.
But you don't need to do that. A service account is only necessary if you want to run PostgreSQL as a service.
You can easily use the ZIP distribution to install and start Postgres without the need for a full-fledge installation or a service account.
The steps to do so are:
Unzip the binaries
Run initdb pointing it to the directory where the database cluster should be created.
Run pg_ctl to start the server.
Note that the steps 2) and 3) must be run using the same user, otherwise the server will have no priviliges to write to the data directory.
These steps can easily be put into a batch file or shell script.
Hard to understand your question, but I think you are talking about the Windows installer for PostgreSQL. Right? What version, what installer, what about error messages, loggings, etc. ?
The installer can be found here.
SQL = database language, SQL Server =
Microsoft database product

Portable PostgreSQL for development off a usb drive

In order to take some development work home I have to be able to run a PostgreSQL database.
I don't want to install anything on the machine at home. Everything should run off the usb drive.
What development tools do you carry on your USB drive?
That question covers pretty much everything else, but I have yet to find a guide to getting postgresql portable. It doesn't seem easy if it's even possible.
So how do I get PostgreSQL portable? Is it even possible?
EDIT:
PostgreSQL Portable works. It's very slow on the usb-drive I have, but it works. I can't recommend doing constant development with it but for what I need it's great.
Perhaps if I pick up a full speed external drive I'll try out virtualization. Given the poor performance of just running the database off this drive, a full virtual OS running off of it would be unusable.
Here's how you can do this on your own:
http://www.postgresonline.com/journal/archives/172-Starting-PostgreSQL-in-windows-without-install.html
An alternate route would be to use something like VirtualBox and just install your development environment (database, whatever) on there.
There are 2 projects to try in 2014: http://sourceforge.net/projects/pgsqlportable/ and http://sourceforge.net/projects/postgresqlportable/?source=recommended.
I can't vouch for the second, but I'm using the first and it works right out of the box.
After unzipping using 7-zip (http://www.7-zip.org/download.html):
1) Run "start service without usuario.bat" ( english translation )
2) Then run "pgadmin3.bat"
The only minimal problem for me was that its in spanish. I've been able to change the language to english by following Change language of system and error messages in PostgreSQL. Using google translate the instructions are:
Description
This is a zip to automatically run postgresql 9.1.0.1 for windows. This version already has pgagent and pldebugger. To run must: 1) unzip
the zip 2) run the "start service without usuario.bat" found in the
pgsql directory within the folder you just unzipped. 3) Optional. If
you want to run the agent works postgresql (pgagent) should only run
the "start pgagent.bat" found in the pgsql directory inside the folder
you just unzipped. 4) Optional. To manage and / or develop the bd you
can run the pgadmin3.bat 5 files) Optional. To stop and / or restart
the server correctly use file "service without stopping usuario.bat"
usuario.bat or restart service without depending on the case.
Now option for Linux (file. Tar.gz). Postgresql portable Linux 9.2
Please use the tickets for your answer bugs.
Username: postgres Password: 123
Just a Note : on a new computer , to get pgadminIII working you may need to add a db. The settings are in attached screenshot.
Hope it helps.
I agree with virtualization solution, but maybe you can find useful this link from portable freeware collection, I have used this locally, not from usb though
1.download and extract : zip version
2.inside pgsql folder create data folder(put any name,I used 'data')
3.initalize data folder: c:\pgsql\bin\initdb.exe -D c:\pgsql\data -U postgres -W -E UTF8 -A scram-sha-256
4.to start/stop see next cmd code that I use (press any key inside it to stop)
c:\pgsql\bin\pg_ctl.exe -D c:\pgsql\data -l logfile start
pause
c:\pgsql\bin\pg_ctl.exe -D c:\pgsql\data stop
more info