Is there a way to upload a deep learning model to a PostgreSQL database? - postgresql

I have deep learning models (tensorflow in hdf5 format), which I want to upload to a PostgreSQL database.
A single model may be up to 500 MBs, and the models need to be updated and uploaded/downloaded from the database frequently (e.g., once every 30 mins).
I'm a begineer to PostgreSQL, so any information/receipes to start with would be appreciated.

Relation DB is a wrong tool for this. Store the link to an S3 bucket for example. Or your own private cloud if the model is private.

Storing large binary data in an relational DB is possible but unwise. Also, you have to eventually copy your model from the DB to the disk for TensorFlow to load anyway. Why not store your model directly on the disk?

Related

InfluxDB and relational metadata?

I am currently working on a project with InfluxDB, where I have to track and store several measurements. My issue: For all my measurement I also need to store some additional relational metadata (for reporting), which only change once or twice a week. Moreover I have some config data, for my applications..
So my question: Do you have any suggestions or experience with storing time-series and relational data like that? Should I try to store everything (even the relational data) within the InfluxDB or should I use an external relational DB for that?
Looking forward to your help!
Marko

How do I efficiently migrate the BigQuery Tables to On-Prem Postgres?

I need to migrate the tables from the BigQuery to the on-prem Postgres database.
How can I efficiently achieve that?
Some thoughts that are coming
I will use Google APIs to export the data from the tables
Store it locally
And finally, import to Postgres
But I am not sure if that can be done for a huge amount of data in TBs. Also, how can I automate this process? Can I use Jenkins for that?
Exporting the data from BigQuery, store it and importing it to PostgreSQL is a good approach. Here are other two alternatives that you can consider:
1) There's a PostgreSQL wrapper for BigQuery that allows to query directly from BigQuery. Depending on your case scenario this might be the easiest way to transfer the data; although, for TBs it might not be the best approach. This suggestion was made by #David in this SO question.
2) Using Dataflow. You can create a ETL process using Apache Beam to made the transfer. Take a look at this how-to for transferring data from BigQuery to CloudSQL. You would need to adapt it for local PostgreSQL, but the idea maintains.
Here's another SO answer that gives more context on this approach.

MongoDB to DynamoDB

I have a database currently in Mongo running on an EC2 instance and would like to migrate the data to DynamoDB. Is this possible and what is the most cost effective way to achieve this?
When you ask for a "cost effective way" to migrate data, I assume you are looking for existing technologies that can ease your life. If so, you could do the following:
Export your MongoDB data to a text file, say in tsv format, using mongoexport.
Upload that file somewhere in S3.
Import this data, in S3, to DynamoDB using AWS Data Pipeline.
Of course, you should design & finalize your DynamoDB table schema before doing all this.
Whenever you are changing databases, you have to be very careful about the way you migrate data. Certain data formats maintain type consistency, while others do not.
Then there are just data formats that cannot handle your schema. For example, CSV is great at handling data when it is one row per entry, but how do you render an embedded array in CSV? It really isn't possible, JSON is good at this, but JSON has its own problems.
The easiest example of this is JSON and DateTime. JSON does not have a specification for storing DateTime values, they can end up as ISO8601 dates, or perhaps UNIX Epoch Timestamps, or really anything a developer can dream up. What about Longs, Doubles, Ints? JSON doesn't discriminate, it makes them all strings, which can cause loss of precision if not deserialized correctly.
This makes it very important that you choose the appropriate translation medium. The generally means you have to roll your own solution. This means loading up the drivers for both databases, reading an entry from one, translating, and writing to this other. This is the best way to be absolutely sure errors are handled properly for your environment, that types are kept consistently, and that the code properly translates schema from source to destination (if necessary).
What does this all mean for you? It means a lot of leg work for you. It is possible somebody has already rolled something that is broad enough for your case, but I have found in the past that it is best for you to do it yourself.
I know this post is old, Amazon made it possible with AWS DMS, check this document :
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
Some relevant parts:
Using an Amazon DynamoDB Database as a Target for AWS Database
Migration Service
You can use AWS DMS to migrate data to an Amazon DynamoDB table.
Amazon DynamoDB is a fully managed NoSQL database service that
provides fast and predictable performance with seamless scalability.
AWS DMS supports using a relational database or MongoDB as a source.

tELTPostgresql* usage issue

I'm trying to use tELTPostgresqlOutput with postgres 9.3 server and this is the result:
With a simple tPostgresqlInput and a tLogRow it works perfectly.
This is not how to use the ELT components. These should be used to do in database server transformations such as creating a star schema table from multiple tables in the same database. This allows you to use the database to do the transformation and avoid reading the data into memory for your job. It's particularly useful when dealing with large datasets that can't be broken down for the transformation.
If you want to transfer data from one database server/vendor to another you will need to use ETL components (pretty much anything not explicitly marked ELT) to read data out of the source database and write it back to the target database.
In this case you should be using a tMSSQLInput component to read in the data you need, a tMap to transform the data in the way you want and a tPostgresqlOutput component to write the data out to the Postgres database.

HDFS to PostgreSQL

We need a process in place to pull data from Hadoop Distributed File System (HDFS) to a relational DB (PostgreSQL) on a regular basis. We will need to transfer several million records per hour and I am looking for the best industry standards to move data out of HDFS. Does any one have any suggestions?
The idea is for a web app to interact with PostgreSQL which will have aggregated data.
Sqoop is built for the purpose of moving data between relational data stores and Hadoop. Specifically, you want sqoop-export.