Get PostGIS geometry field on Drill - postgresql

I have a table with a geometry column and if I query it using PostGIS, it shows the records right:
PostGIS query image:
The problem is when I execute the query using Apache Drill, because it shows all the records fine except the geometry one, it shows as null.
Drill query image:
Reviewing the logs, it shows the following error:
WARN o.a.d.e.store.jdbc.JdbcRecordReader - Ignoring column that is
unsupported. org.apache.drill.common.exceptions.UserException:
UNSUPPORTED_OPERATION ERROR: A column you queried has a data type that
is not currently supported by the JDBC storage plugin. The column's
name was geom_multipolygon and its JDBC data type was OTHER
I tested creating the Drill storage plugin with postgis-jdbc-2.2.1.jar and postgresql-42.1.4.jar but the same error is shown.
If I use:
cast(geom_multipolygon as varchar(255))
it shows the varchar representation of the geometry, another option is getting the MULTIPOLYGON text and transform to Drill binary using ST_GeomFromText(geom), but we need the binary format directly from PostGIS, so those approaches can't be done.
We have seen this page: https://github.com/k255/drill-gis/issues/1 but the proposed solution doesn't work for us, so I think there is a way to achieve this.
UPDATE: I finally found the way that Drill can show the geo fields, is to change the data type in PostGIS from geometry to bytea. It seems to be a compatibility issue. With this way, we can perform geospatial queries on Drill, but in PostGIS those fields are no longer geometries, so they can not be indexed and treated as such.

Related

Redshift Spectrum table doesnt recognize array

I have ran a crawler on json S3 file for updating an existing external table.
Once finished I checked the SVL_S3LOG to see the structure of the external table and saw it was updated and I have new column with Array<int> type like expected.
When I have tried to execute select * on the external table I got this error: "Invalid operation: Nested tables do not support '*' in the SELECT clause.;"
So I have tried to detailed the select statement with all columns names:
select name, date, books.... (books is the Array<int> type)
from external_table_a1
and got this error:
Invalid operation: column "books" does not exist in external_table_a1;"
I have also checked under "AWS Glue" the table external_table_a1 and saw that column "books" is recognized and have the type Array<int>.
Can someone explain why my simple query is wrong?
What am I missing?
Querying JSON data is a bit of a hassle with Redshift: when parsing is enabled (eg using the appropriate SerDe configuration) the JSON is stored as a SUPER type. In your case that's the Array<int>.
The AWS documentation on Querying semistructured data seems pretty straightforward, mentioning that PartiQL uses "dotted notation and array subscript for path navigation when accessing nested data". This doesn't work for me, although I don't find any reasons in their SUPER Limitations Documentation.
Solution 1
What I have to do is set the flags set json_serialization_enable to true; and set json_serialization_parse_nested_strings to true; which will parse the SUPER type as JSON (ie back to JSON). I can then use JSON-functions to query the data. Unnesting data gets even crazier because you can only use the unnest syntax select item from table as t, t.items as item on SUPER types. I genuinely don't think that this is the supposed way to query and unnest SUPER objects but that's the only approach that worked for me.
They described that in some older "Amazon Redshift Developer Guide".
Solution 2
When you are writing your query or creating a query Redshift will try to fit the output into one of the basic column data types. If the result of your query does not match any of those types, Redshift will not process the query. Hence, in order to convert a SUPER to a compatible type you will have to unnest it (using the rather peculiar Redshift unnest syntax).
For me, this works in certain cases but I'm not always able to properly index arrays, not can I access the array index (using my_table.array_column as array_entry at array_index syntax).

Converting data types of Foreign Keys to use Joiner in Google Cloud Data Fusion Pipeline

I am building a pipeline that connects to an on-prem Oracle DB using the Database Plugin, queries two tables (table_a, table_b), and joins those tables using Joiner Plugin, before uploading to a BigQuery table.
The problem I have now is that the Foreign Keys to join table_a and table_b have different data types when I use Get Schema in the Database Plugin. In Joiner, I am joining the tables on table_a.customer_id = table_b.customer_id.
The dtype of table_a.customer_id is LONG but table_b.customer_id is DOUBLE. In the source Oracle DB, both columns are actually integers. For some reason, though, using Get Schema thinks they are LONG and DOUBLE.
I am obviously getting an error in Joiner trying to join on a foreign keys with different data types.
Is there a way to cast/convert the columns from the tables to match so that I can use Joiner?
I've seen some examples using Wrangler Transform to parse dates, but I don't see anything to convert to any other data types. I couldn't find any directive examples either: https://github.com/data-integrations/wrangler.
You can transform your data before joining them by using any of the transform plugins that Cloud Data Fusion offers. As #muscat mentioned, you can use Wrangler transform and utilize the Set type directives, or you can use the Projection transform and configure the Convert field.

Returning count of updated rows when UPserting to a Postgres table using jOOQ

I am upserting some data to a Postgres table using jOOQ's insertInto() and onDuplicateKeyUpdate() methods. I want to know later how many duplicates were in my data and hence need to return if a row was inserted or updated.
From my postgres specific research so far, I found RETURNING (not MY_TABLE.xmax = 0) AS updated to be a valid option. However, the auto-generated Java table classes from jOOQ don't seem to give me access to the system columns of postgres like xmax.
Here is my query so far:
dsl.insertInto(MY_TABLE)
.columns(
// pkey columns
MY_TABLE.SHIFT,
MY_TABLE.DATE_UTC,
MY_TABLE.TIME_UTC,
MY_TABLE.DURATION,
)
.values(
shiftId,
utcDateId,
utcTime,
duration
)
.onDuplicateKeyUpdate()
.set(MY_TABLE.DURATION, newDuration)
.returning((MY_TABLE.xmax = 0).`às`("inserted"))
.execute()
This causes the following compile time error:
Error: Kotlin: Unresolved reference: XMAX
I have rechecked my Maven jOOQ table generation configuration and I am not excluding any columns. I have also read through everything I could find on jOOQ's own website but found no useful information for this specific use-case.
Any tips on what I could do here?
In this case you should use jOOQ's SQL templating. Specifically look at the DSL.field() method. Something like this: field("my_table.xmax", int.class).eq(0).

PostGIS geography query returns a string value

I have a strange issue. The lonlat column on my app works well on the development server –– its output is in the form of POINT(X Y). But when I move the data to the production server, the output is strange!
ActionView::Template::Error (undefined method `lon' for "0101000020E6100000541B9C887E7A52C02920ED7F80614440":String):
The lonlat value, which is encoded with SRID: 4326, is being read as a string. I am almost certain that there was a corruption in the data during migrating it from development to production because this was not a problem before the migration.
Does anyone know what about the database schema or column may cause this issue?
A geometry field stores its data as WKB. To see the WKT representation you need to change your query to something like
select ST_Astext(the_geom) as geometry from table
However, I don't know why in your development you have some kind of implicit conversion between WKB binary data and WKT strings. ¿What version of postgres and postgis are you using?
What lang is in your app server?
Is that ActiveRecord you're using?
I suggest you try something like
float ST_X(geometry a_point);
To make sure you can read the data properly and determinate if problem is on the data field or somewhere else.
I also would try doing the pg_dump in a single step if you determinate the problem is with the geometry column.
You can use pg_dump with option
--exclude-table-data=reg_expresion_ _tablename_
--exclude-table-data=schema.reg_expresion_ _tablename_
This will bring all the schema definition, but exclude the table data and bring only the data from table you need.
Turns out that when I killed the connection to the server to migrate the data, Rails did not set the schema search path (meaning didn't discover the postgis extension) upon reconnecting. I had to restart the server to solve this problem.

PostGIS Conversion Issues

I am having an issue using PostGIS (1.5.4) data. It may be that I'm just not familiar enough with this technology to see the obvious (I'm a regular expert with nearly 4 hours of experience), but I am running into an error that I have been unable to solve with Google.
I have a table which includes Polygon data (and yes, I checked; the column type is geometry, not polygon- the Postgres native type). The problem arises when I am trying to run a query on the table to find which shape contains a particular problem.
I am using the following query:
SELECT *
FROM geo_shape
WHERE ST_Contains(geoshp_polygon, POINT(-97.4388046000, 38.1112251000));
The error I receive is 'ERROR: function st_contains(geometry, point) does not exist'. I tried using a CAST() function, but got 'ERROR: cannot cast type geometry to polygon'. I'm guessing the issue has to do with the way the data is stored- PGAdmin shows it as hex data. I tried using ST_GeomFromHEXEWKB() just on a hunch, but received 'ERROR: function st_geomfromhexewkb(geometry) does not exist'.
I'm fairly confused as to what the issue is here, so any ideas at all would be much appreciated.
st_contains needs a geom,geom as arguments...
Give this a try...
SELECT * FROM geo_shape
WHERE ST_Contains(geoshp_polygon,
GeomFromText('POINT(-97.4388046000 38.1112251000)'));
Editted to correct , issue in the point data. ST_geomfromtext will work, kinda curious what the difference is there
You cannot mix PostgreSQL's geometric data types with PostGIS's geometry type, which is why you see that error. I suggest using one of PostGIS's geometry contstructors to help out:
SELECT *
FROM geo_shape
WHERE ST_Contains(geoshp_polygon,
ST_SetSRID(ST_MakePoint(-97.4388046000, 38.1112251000),4326);
Or a really quick text way is to piece together the well-known text:
SELECT 'SRID=4326;POINT(-97.4388046000 38.1112251000)'::geometry AS geom;
(this will output the WKB for the geometry type).