Meltano: tag-postgres to target-postgres: data type such as uuid is converted to varchar - postgresql

I'm actually working on a Meltano project where I have to extract data from one postgres database and load to the final warehouse (also a postgres database) using "Key-based Incremental Replication".
After the loading process of meltano, all table with columns of type uuid from the "tap-postgres" are changed to varchar in the target.
Could someone help to solve this issue?

Singer taps rarely capture "rich" string types such as uuid, and they output simple {"type": "string"} types without any additional metadata. Furthermore, Singer usually only supports JSONSchema Draft 4, so {"type": "string", "format": "uuid"} would not even be supported by most taps and targets.
Since Singer is an ELT analytics framework and in that context a uuid column type is not different in any meaningful way from a varchar, this is hardly an issue for most people.
If you need to replicate a database with that level of detail you may be better served by a different solution for database backup/replication.

Related

postgresql json_build_object property LIKE

I have a table with a column named "data" that contains json object of telemetry data.
The telemetry data is recieved from devices, where some of the devices has received a firmware upgrade. After the upgrade some of the properties has received a "namespace" (e.g. ""propertyname"" is now ""dsns:propertyname""), and there is also a few of the properties which is suddenly camelcased (e.g. ""propertyname"" is now ""propertyName"".
Neither the meaning of the properties or the number of properties has changed.
When querying the table, I do not want to get the whole "data", as the json is quite large. I only want the properties that are needed.
For now I have simply used a json_build_object such as:
select json_build_object('requestedproperty',"data"-> 'requestedproperty','anotherrequestedproperty',"data"-> 'anotherrequestedproperty') as "data"
from device_data
where id ='f4ddf01fcb6f322f5f118100ea9a81432'
and timestamp >= '2020-01-01 08:36:59.698' and timestamp <= '2022-02-16 08:36:59.698'
order by timestamp desc
But it does not work for fetching data from after the firmware upgrade for the devices that has received it.
It still works for querying data from before the upgrade, and I would like to have only one query for this if possible.
Are there some easy way to "ignore" namespaces and be case insensitive through json_build_object?
I'm obviously no postgresql expert, and would love all the help I can get.
This kind of insanity is exactly why device developers should not ever be allowed anywhere near a keyboard :-)
This is not an easy fix, but so long as you know the variations on the key names, you can use coalesce() to accomplish your goal and update your query with each new release:
json_build_object(
'requestedproperty', coalesce(
data->'requestedproperty',
data->'dsns:requestedproperty',
data->'requestedProperty'),
'anotherrequestedproperty', coalesce(data-> 'anotherrequestedproperty',
data-> 'dsns:anotherrequestedproperty',
data-> 'anotherRequestedProperty')
) as "data"
Edit to add: You can also use jsonb_each_text(), treat key-value pairs as rows, and then use lower() and a regex split to doctor key names more generally, but I bet that the kinds of scatterbrains behind the inconsistencies you already see will eventually lead to a misspelled key name someday.

Cassandra Alter Column type from Timestamp to Date

Is there any way to alter the Cassandra column from timestamp to date without data lost? For example '2021-02-25 20:30:00+0000' to '2021-02-25'
If not, what is the easiest way to migrate this column(timestamp) to the new column(date)?
It's impossible to change a type of the existing column, so you need to add a new column with correct data type, and perform migration. Migration could be done via Spark + Spark Cassandra Connector - it could be most flexible solution, and even could be done via single node machine with Spark running in the local master mode (default). Code could look something like this (try on test data first):
import pyspark.sql.functions as F
options = { "table": "tbl", "keyspace": "ks"}
spark.read.format("org.apache.spark.sql.cassandra").options(**options).load()\
.select("pk_col1", "pk_col2", F.col("timestamp_col").cast("date").alias("new_name"))\
.write.format("org.apache.spark.sql.cassandra").options(**options).save()
P.S. you can use DSBulk, for example, but you need to have enough space to offload the data (although you need only primary key column + your timestamp)
To add to Alex Ott's answer, there are validations done in Cassandra that prevents changing the data type of a column. The reason is that SSTables (Cassandra data files) are immutable -- once they are written to disk, they are never modified/edited/updated. They can only be compacted to new SSTables.
Some try to get around it by dropping the column from the table then adding it back in with a new data type. Unlike traditional RDBMS, the existing data in the SSTables don't get updated so if you tried to read the old data, you'll get a CorruptSSTableException because the CQL type of the data on disk won't match that of the schema.
For this reason, it is no longer possible to drop/recreate columns with the same name (CASSANDRA-14948). If you're interested, I've explained it in a bit more detail in this post -- https://community.datastax.com/questions/8018/. Cheers!
You can use ToDate to change it. For example: Table Email has column Date with format: 2001-08-29 13:03:35.000000+0000.
Select Date, ToDate(Date) as Convert from keyspace.Email:
date | convert ---------------------------------+------------ 2001-08-29 13:03:35.000000+0000 | 2001-08-29

Using an array type in the schema for an Ada interface to a postgresql database using gnatcoll_db2ada

I've created a Postgresql database with a few tables and am fairly content with how they work. I've also written some Ada code to interface with and perform simple queries. This all running on Slackware 14.2 using GNAT 2020.
One of my table columns is of an array type, an array of BIGINT.
The problem I have is when I try to create the schema for my Ada using gnatcoll_db2ada.
The schema file ("all-schema.txt") includes the following line:
item_list | BIGINT[] | | | |
When I do
gnatcoll_db2ada -dbmodel all-schema.txt
I get
Error: unknown field type "BIGINT[]"
all-schema.txt:33 gnatcoll-sql-inspect.adb:1420
gnatcoll-sql-inspect.adb:1420
Is what I'm trying to do actually possible?
The documentation suggests that database fields of array types are not supported (i.e. they are not mentioned as being supported). From the document SQL: Database interface:
The type of the field is the SQL type ("INTEGER", "TEXT", "TIMESTAMP", "DATE", "DOUBLE PRECISION", "MONEY", "BOOLEAN", "TIME", "CHARACTER(1)"). Any maximal length can be specified for strings, not just 1 as in this example. The tool will automatically convert these to Ada when generating Ada code. A special type ("AUTOINCREMENT") is an integer that is automatically incremented according to available ids in the table. The exact type used will depend on the specific DBMS.
Note that while the scalar field type "BIGINT" is not mentioned in the documentation, it is mentioned in the source code (see gnatcoll-sql.ads).
If you really need support for the "BIGINT" array type, then a quick glance at the source code suggests that you can extend the GNATCOLL DB interface with new field types by
using the generic package GNATCOLL.SQL_Impl.Field_Types (see here) and
the creation of a new field mapping (i.e. a new concrete type based on GNATCOLL.SQL.Inspect.Field_Mapping, see here).
It seems that new field types are typically placed in package GNATCOLL.SQL_Fields (see here).
Note that I never did this myself, so I cannot tell how much effort it will be and whether this is really all that is needed; The exact requirements for implementing a new field type are (at the time of writing) not documented.
I suspected as much, having briefly looked at the source.
What I'll do is spin off the array into another table. This at least has helped clarify what I need to, and the array, to be fair, always felt a bit clunky. Thanks for the the comments.

PostgreSQL Data Type

Can someone advise me on the SQL data type that should be used for a DICOM UID, 1.2.840.113986.3.2702661254.20150220.144310.372.4424 as a sample. I would like to use it as a primary key as well.
There are two options available here- either use a less-than-ideal data type which already exists, of which "text" is almost certainly the best option, or implement a custom data type for this particular type of data.
While the best built-in option is "text", looking at the example provided, you would likely get significant performance and space benefits from using a custom data type, though it would require writing code to implement it.
A final option to consider is to use a surrogate key for that data. To do this, you would build a table which contains a "bigserial" column and then a "text" column. The "text" column would hold the long form of the value as you have it shown above and the "bigserial" column would provide an integer (64bit with bigserial, 32 bit if you use "serial" instead) which you would then use in all of your tables, instead of the long form.

Metadata information in Oracle NoSql

I want to view the schema of data which are being stored in kvstore , like what are the keys and their type and also values and their type(as Oracle NoSql is a key-value store). As per my knowledge we can use "show schema " command but it will work only if Avro schema is added in that particular store and second thing is it will give the information of only value names and its type but key name and its type is still a bottleneck.
So is there any utility I can use to view the structure of data like we use "describe" command in oracle SQL ?
You are right that 'kv->show schema' will show you the field names (columns) and its types when you have a Avro schema. When you don't register a schema then database have no knowledge of what your value object looks like. In that case client application maintains the schema of the value field (instead of the database).
About the keys, a) keys are always string type b) you can view them from the datashell prompt if you do something like this "kv-> get kv -keyonly -all".
I would also like to mention that in the upcoming R3 release we will be introducing table data model which will give you much closer experience to relational database (in case of table definitions). You can take a look of a webinar we did on this subject: http://bit.ly/1lPazSZ.
Hope that helps,
Anuj