Export large dataset (JSON) from PostgreSQL - postgresql

I have a postgres database with geospatial data and I want to export certain parts of those as a GeoJSON.
So I have a SQL-Command like the following:
SELECT jsonb_build_object ( 'some_test_data', jsonb_agg (ST_AsGeoJSON (ST_Transform (way, 4326))::jsonb)) as json
FROM (
SELECT way, name, highway
FROM planet_osm_line
LIMIT 10) result
and that basically works fine. I can also save it to a file and dump it directly to a file like so:
psql -qtAX -d my-database -f my_cool_sql_command.sql > result.json
So my data is correct and usable, but now I'd like to remove the LIMIT 10 and I get ERROR: total size of jsonb array elements exceeds the maximum of 268435455 bytes
I've read that it's not easy to remove this 256MB limit of postgres... But I guess there are other ways to get my result?

Try approach with shrinking the file be simplifying the geometry with st_simplify() and json_build_object() (should be 1 GB limit [same as a limit for text] not 256 MB as for binary files)
SELECT json_build_object('some_test_data', json_agg (ST_AsGeoJSON (ST_Transform (way, 4326))::json)) as json
FROM (SELECT st_simplify(way,1) way, name, highway
FROM planet_osm_line) result
You can simplify more then for 1 meter but with 1 meter you usually don't lose any important vertices on your graph.

Related

See all rows and download the result to csv in pgAdmin4

I have data 66596470 which i got from calculation in query not from table. but in pgAdmin4 I can only see 1000 of 66596470. How can I see all rows or download them to csv? Thank you...
This is not PostgreSQL you are talking about, but a GUI client that limits the number of result rows. You can probably configure the unnamed tool to display more than 1000 rows, but if you tried to actually see all 66 million rows, that would take a lot of time, both for the client software and you as a reader, and the client would probably go out of memory.
If you want to retrieve (rather than see) the complete result set, use psql's \copy command to write the result set to a file. FOr example:
\copy (SELECT ...) TO 'filename' (FORMAT 'csv')

Increase Filter Limit in Apache Superset

I am trying to create a filter for a field that contains over 5000 unique values. However, the filter's query is automatically setting a limit of 1000 rows, meaning that the majority of the values do not get displayed in the filter dropdown.
I updated the config.py file inside the 'anaconda3/lib/python3.7/site-packages' directory by increasing the DEFAULT_SQLLAB_LIMIT and QUERY_SEARCH_LIMIT to 6000, however this did not work.
Is there any other config that I need to update?
P.S - The code snippet below shows the json representation of the filter where the issue seems to be coming from.
"query": "SELECT casenumber AS casenumber\nFROM pa_permits_2019\nGROUP BY casenumber\nORDER BY COUNT(*) DESC\nLIMIT 1000\nOFFSET 0"
After using the grep command to find all files containing the text '1000', I found out the the filter limit can be configured through the filter_row_limit in viz.py

Returning JSON from Postgres is slow

I have a table in Postgres with a JSONB column, each row of the table contains a large JSONB object (~4500 keys, JSON string is around 110 KB in a txt file). I want to query these rows and get the entire JSONB object.
The query is fast -- when I run EXPLAIN ANALYZE, or omit the JSONB column, it returns in 100-300 ms. But when I execute the full query, it takes on the order of minutes. The exact same query on a previous version of the data was also fast (each JSONB was about half as large).
Some notes:
This ends up in Python (via SQLAlchemy/psycopg2). I'm worried that the query executor is converting JSONB to JSON, then it gets encoded to text for transfer over the wire, then gets JSON encoded again on the Python end.
Is this correct? If so how could I mitigate this issue? When I select the JSONB column as ::text, the query is roughly twice as fast.
I only need a small subset of the JSON (around 300 keys or 6% of keys). I tried methods of filtering the JSON output in the query but they caused a substantial further performance hit -- it ended up being faster to return the entire object.
This is not necessarily a solution, but here is an update:
By casting the JSON column to text in the Postgres query, I was able to substantially cut down query execution and results fetching on the Python end.
On the Python end, doing json.loads for every single row in the result set brings me to the exact timing as using the regular query. However, with the ujson library I was able to obtain a significant speedup. The performance of casting to text in the query, then calling ujson.loads on the python end is roughly 3x faster than simply returning JSON in the query.

apache cassandra - Inconsistency between number of records returned and count(*) result

I am importing some data into a table in Apache Cassandra using COPY command. I have 7 rows in my csv files. But after importing I just have 1 row instead of 7 rows. What would make this inconsistency?
attached is the image of my cqlsh screen
Possible issue:
same clustering key for the rows.
Solution
try adding another column as clustering key (domain specific) that gives the rows uniqueness.

osm2pgsql data converting: lost columns

I've executed osm data converting using osm2pgsql from *.bz2 format to PostgreSQL database. But after converting I don't see such columns in table planet_osm_roads as: lanes, maxspeed.
Сan someone explain where are these columns? Thanks.
Add the option -k when using osm2pgsql
osm2pgsql -d geodatabase -k planet.osm.bz2
-k|--hstore Add tags without column to an additional hstore (key/value) column to postgresql tables
Explanation: osm2pgsql imports normally the data in a static database schema. The tags without a corresponding column are ignored. By adding the option -k or --hstore, osm2pgsql will add a new hstore column tags to each table and save there all tags without column.
Depending of your needs, you can use the -j instead, which make osm2pgsql to save ALL tags in the tags column, this means, the tags with a database column too.
-j|--hstore-all Add all tags to an additional hstore (key/value) column in postgresql tables
After the import, to extract all maxspeed tags from the database, you can use query like this (in example):
SELECT osm_id, name, tags -> 'maxspeed' FROM planet_osm_roads;
where tags is the hstore column and -> is a hstore operator.
See the Postgresql documentation for more infos about the hstore type and his operators: http://www.postgresql.org/docs/9.3/static/hstore.html
This should better be a comment, however, I don't have enough reputation to do so: Instead of using .bz2, I recommend strongly to use .pbf, the "Protocolbuffer Binary Format", because: "It is about half of the size of a gzipped planet and about 30% smaller than a bzipped planet. It is also about 5x faster to write than a gzipped planet and 6x faster to read than a gzipped planet. The format was designed to support future extensibility and flexibility." More infos: http://wiki.openstreetmap.org/wiki/PBF_Format