Postgres - Convert bytea PNG column to GeoTiff - postgresql

I have loaded a massive set of tiles into a postgres database for use in a tile server. These were all loaded into bytea columns in PNG format.
I now find out that the tile server code needs these to be in GEOTiff format.
The command:-
gdal_translate -of GTiff -expand rgb -co COMPRESS=DEFLATE -co ZLEVEL=6
Works perfectly.
However, a massive amount of data is loaded already on a remote server. So is it possible for me to do the conversion within the database instead of retrieving each file and using gdal_translate on them individually? I understand that gdal is integrated with postgis 2.0 through the raster support which is installed on my server.
If not, any suggestions as to how to do this efficiently.

Is it possible to do in the database with an appropriate procedural language? I suppose. Additionally it is worth noting up front that gdal's support for Postgis goes rather one way.
To be honest the approach is likely to be "retrieve individual record, convert, restore using external image processing like you are doing." You might get some transaction benefit but this is likely to be offset by locks.
If you are going to go this route, you may find pl/java to be the most helpful approach since you can load any image processing library Java supports and use that.
I am, however, not convinced that this will be better than retrieve/transform/load.

Related

how to store image in postgres using Flask model

I am trying to store an image using flask model. I don't know how to store the image in postgres, so I have encoded the image to base64 and I am trying to store that resulting text in postgres. It is working but is there any recommended way to store that encoded text or the image in postgres using flask model
class User_tbl(db.Model):
id = db.Column(db.Integer,primary_key=True)
mobile=db.Column(db.String(13),unique=True)
country=db.Column(db.String(30))
image=db.Column(db.String(256))
def __init__(self,mobile,country,image):
self.mobile=mobile
self.country=country
self.image = image
I know that maybe it's too late to answer this question, but in this days I was trying to solve something similar and none of the solutions proposed seem to shed light on the main problem.
Of course any best practice rests on your needs. In general terms however, you will find that embed a file in the database is not a good practice. Well, it depends.
Reading the "Storing Binary files in the Database" produced by postgresql wiki, I discovered that there are some circumtances in which this practice is instead higly recommended, for instance when the files must be ACID.
In those cases, at least in Postgres, bytea datatype is to be preferred over text or BLOB binary, sometimes at the cost of some higher memory requirements for the server.
In this case:
1) you don't need special sqlalchemy dialects. LargeBinary datatype will suffice, since it will be translated as a "large and/or unlengthed binary type for the target platform".
2) You don't need any encode/decode functions in PostgreSQL, of course in this specific case.
3) As I told before, it is not always a good strategy to save the files into the filesystem. In any case do not use text data type with base64 encoding. Your data will inflated more or less of the 33%, thus resulting in a huge storage impact, whereas bytea has not the same drawback
Thus, I propose these changes to your model:
class User_tbl(db.Model):
id = db.Column(db.Integer,primary_key=True)
mobile=db.Column(db.String(13),unique=True)
country=db.Column(db.String(30))
image=db.Column(db.LargeBinary)
Then you can save files into Postgres simply by passing your FileStorage parameter as a binary:
image = request.files['fileimg'].read()
It would be far easier to avoid all of this encoding and decoding and simply save it as a binary blob. In which case, use a sqlalchemy.dialects.postgresql.BYTEA column.
I know of the encode and decode functions in PostgreSQL for dealing with base64 data, see:
https://www.postgresql.org/docs/current/static/functions-string.html
(encode/decode)
Thanks,
The recomended way to store an image in postgres via flask is to store the image in your static folder(where you store Javascript & CSS files) and serve it via a web server i.e. nginx. It will be able to do it more efficiently than flask.You should only store the path to your image on postgres and then store the actual image on the File system.

Do Redshift column encodings affect query execution speed?

When creating data tables in Amazon Redshift, you can specify various encodings such as MOSTLY32 or BYTEDICT or LZO. Those are the compressions used when storing the columnar values on disk.
I am wondering if my choice of encoding is supposed to make a difference in query execution times. For example, if I make a column BYTEDICT would that make a difference over LZO when it comes to SELECTs, GROUP BYs or FILTERs?
Yes. The compression encoding used translates to amount of disk storage. Generally, the lower the storage the better would be query performance.
But, which encoding would be be more beneficial to you depends on your data type and its distribution. There is no gurantee that LZO will always be better than Bytedict or vice-a-versa. In my experience, I usually load some sample data in the intended table. Than do a analyze compression. Now whatever Redshift suggests, I go with it. That has worked for me.
Amazon actually has released a python script that can apply this automatically to your database. You can find this script here https://github.com/awslabs/amazon-redshift-utils/blob/master/src/ColumnEncodingUtility/analyze-schema-compression.py
Bit late but likely useful to anyone taking a look here:
Amazon can now decide on the best compression to use (Loading Tables with Automatic Compression), if you are using a COPY command to load your table, and there is no existing compression defined in your table.
You just have to add COMPUPDATE ON to your COPY command.

Links to files outside a PostgreSQL database

A stupid newbie question: I want to make a PostgreSQL (9.2.2 with PostGIS 2.0.1; on 32-bit Windows XP) database with rasters saved outside the database (I will need the rasters to be accessed from outside the database and they won't be uploaded/migrated frequently, so consistence is not an issue). My problem is: I don't know how to make the links to the rasters (from database with metadata), and I didn't find anything comprehensible enough.
I have found something about data wrappers, but they seem to be intended for data with table structure, not files like rasters. DATALINK seems better, but I'm afraid it's the same case, plus I'm not sure I understood how to use it. In some of the discussions I've found a mention of symbollic links, but these seem to be something Unix-based, and probably only vaguely related.
I'm sure it must be simple, but I didn't manage to solve it myself.
Databases provide no possibilities to link outside objects.
I can think of at least 2 approaches:
Save a full path to your files in some metadata table as one of the attributes or type text. Don't use it for joining tables in queries though, having artifitial key of internal numeric type (like integer or bigint) is a better choice for performance reasons;
Name your raster files according to their numeric keys in the database. This approach has a drawback — without database you will not be able to obtain any usefull info about your files.
Further paths depends on the complexity of your system and choosen optimization techniques.

How to verify large postgresql Databases running different version have the same data without dumping

How Would I verify that the data in a 8.3 postgresql DB is the same as the data in a 9.0 DB
When I did a sql dump on a example table there we3re many differences that showed but this was due to 9.0 truncating 0's on the end and begining of date fields, also the order of the dump was not fixed, even though this can be sorted with sort(no pun intended). it does not allow validation as it would loose what table it was part of as the sorted sql dump would be a meaningless splat of sql commands with dump settings thrown in for extra.
count(*) is also not adequate.
I would like to be 100% sure that the data in one is equal to the data in the other despite the version differences and the way that at the very least dates are held in 9.0.
I should add I have several hundred tables and many hundred GB of data. so i need a automated process like diff DUMPa.sql DUMP2.sql, a SHA of the data(not the format) would be idea, but one cannot diff binary dumps of PostgreSQL for well known reasons. I am aware mysql has a checksum feature, but im using postgresql.
First the bad news. There is really no way to offer the full concerns you want addressed without loading all the data into an intermediary program and directly comparing. This will take time and it will drag your system down load-wise so my recommendation is set up some sort of replication and compare replicas.
One thing you might be able to do is to use something like Slony or Bucardo to replicate, and then triggers to move data into secondary child partitions and replicate those onto a consolidated server for comparison. You could then compare within PostgreSQL. This would reduce the load and it would mean your reporting data would be relatively easy to manage compared to other approaches. But all the data is going to have to be loaded and compared line-by-line.

How to load data into Core Data?

thanks for you help.
I'm attempting to add core data to my project and I'm stuck at where and how to add the actual data into the persistent store (I'm assuming this is the place for the raw data).
I will have 1000 < objects so I don't want to use a plist approach. From my searches, there seems to be xml and csv approaches. Is there a way I can use SQL for input?
The data will not be changed by the user and the data file will be typed in by hand, so I won't need to update these files during runtime, and at this point I am not limited in any type of file - the lightest on syntax is preferred.
Thanks again for any help.
You could load your data from an xml/csv/json file and create the DB on the first lunch of your application (if the DB is not there, then read the data and create it).
A better/faster approach might be to ship your sqllite DB within your application. You can parse the file in any format you want on the simulator, create a DB with all your entities, then take it from the ApplicationData and just add it to your app as a resource.
Although I'm sure there are lighter file types that could be used, I would include a JSON file into the app bundle from which you import the initial dataset.
Update: some folks are recommending XML. NSXMLParser is almost as fast as JSONKit (but much faster than most other parsers), but the XML syntax is heavier than JSON. So an XML bundled file that holds the initial dataset would weight more than if it was in JSON.
Considering Apple considers the format of its persistent stores implementation details, shipping a prefabricated SQLite database is not a very good idea. I.e. the names of fields and tables may change between iOS versions/phones/whatever hidden variable you can think of. You should, in general, not concern yourself with how this serialization of your data is formatted.
There's a brief article about importing data on Apple's developer site: Efficiently Importing Data
You should ship initial data in whatever format you're comfortable with (XML allows you to do incremental parsing efficiently, which reduces memory footprint) and write an import routine to run if you need to import data.
Edit: With EliBud's comment in mind, I still consider the approach a bit "iffy"... The format of the SQLite database used by Core Data is not something you'd want to generate by yourself (it's weird, simply put, and still not something you should really rely on).
So you'd want to use a mock app running on the Simulator and use Core Data to create the database (as per EliBud's answer). But you'd still have to import the data into that mock-app! And while it might make sense to do this once on a "real" computer instead of a lot of times on a mobile device (i.e. copying a file is easy, importing data is hard), you're essentially using the Simulator as an administration tool.
But hey, if it works...