Metadata information in Oracle NoSql - metadata

I want to view the schema of data which are being stored in kvstore , like what are the keys and their type and also values and their type(as Oracle NoSql is a key-value store). As per my knowledge we can use "show schema " command but it will work only if Avro schema is added in that particular store and second thing is it will give the information of only value names and its type but key name and its type is still a bottleneck.
So is there any utility I can use to view the structure of data like we use "describe" command in oracle SQL ?

You are right that 'kv->show schema' will show you the field names (columns) and its types when you have a Avro schema. When you don't register a schema then database have no knowledge of what your value object looks like. In that case client application maintains the schema of the value field (instead of the database).
About the keys, a) keys are always string type b) you can view them from the datashell prompt if you do something like this "kv-> get kv -keyonly -all".
I would also like to mention that in the upcoming R3 release we will be introducing table data model which will give you much closer experience to relational database (in case of table definitions). You can take a look of a webinar we did on this subject: http://bit.ly/1lPazSZ.
Hope that helps,
Anuj

Related

Cassandra Alter Column type from Timestamp to Date

Is there any way to alter the Cassandra column from timestamp to date without data lost? For example '2021-02-25 20:30:00+0000' to '2021-02-25'
If not, what is the easiest way to migrate this column(timestamp) to the new column(date)?
It's impossible to change a type of the existing column, so you need to add a new column with correct data type, and perform migration. Migration could be done via Spark + Spark Cassandra Connector - it could be most flexible solution, and even could be done via single node machine with Spark running in the local master mode (default). Code could look something like this (try on test data first):
import pyspark.sql.functions as F
options = { "table": "tbl", "keyspace": "ks"}
spark.read.format("org.apache.spark.sql.cassandra").options(**options).load()\
.select("pk_col1", "pk_col2", F.col("timestamp_col").cast("date").alias("new_name"))\
.write.format("org.apache.spark.sql.cassandra").options(**options).save()
P.S. you can use DSBulk, for example, but you need to have enough space to offload the data (although you need only primary key column + your timestamp)
To add to Alex Ott's answer, there are validations done in Cassandra that prevents changing the data type of a column. The reason is that SSTables (Cassandra data files) are immutable -- once they are written to disk, they are never modified/edited/updated. They can only be compacted to new SSTables.
Some try to get around it by dropping the column from the table then adding it back in with a new data type. Unlike traditional RDBMS, the existing data in the SSTables don't get updated so if you tried to read the old data, you'll get a CorruptSSTableException because the CQL type of the data on disk won't match that of the schema.
For this reason, it is no longer possible to drop/recreate columns with the same name (CASSANDRA-14948). If you're interested, I've explained it in a bit more detail in this post -- https://community.datastax.com/questions/8018/. Cheers!
You can use ToDate to change it. For example: Table Email has column Date with format: 2001-08-29 13:03:35.000000+0000.
Select Date, ToDate(Date) as Convert from keyspace.Email:
date | convert ---------------------------------+------------ 2001-08-29 13:03:35.000000+0000 | 2001-08-29

SQL: how to get global overview of database

I am new SQL and I was wondering if there is any quick way of getting a global "view" of a new database (if for example you are starting to use a database you know nothing about and you want to just get a global idea of how the entire database looks like).
In other words is there a way to :
Maybe get some graphical representation of the database? - a sort of diagram that shows the relation between all tables
Maybe do some sort of query that could return the no. of rows, no. columns (and ideally column names) of each table in the database?
Apologies if this is a really basic question, I am very new to SQL. I am currently using PostgreSQL and PgAdmin4. Thanks

Mybatis dynamic select query with prevention of 'no column exits' errors

What's the best approch to have a dynamic query like
select $dynamic_columns from table
But also prevent error like column not found and get result with available columns. considering $dynamic_columns is given by end users.
One approach would be to store the schema in java object and filter it. Again if schema is update in DB we will need to update the schema java object cache. is there any better way to handle this?
Be careful with this as it is more vulnerable to SQL injection.
Never let the user type something into a text field, instead build a
list for them to select from.
For building the list, I think the best approach is to use the JDBC method DatabaseMetaData.getColumns(...) to retrieve a list of columns for a table. I don't think there's a need to cache anything.

How to populated the table via Pentaho Data Integration's table_output step?

I am performing an ETL job via Pentaho 7.1.
The job is to populate a table 'PRO_T_TICKETS' in PostgreSQL 9.2 via the Pentaho Jobs and transformations?
I have mapped the table fields with respect to the stream fields
Mapped Fields
My Table PRO_T_TICKETS contains the Schema (Column Names) in UPPERCASE.
Is this the reason I can't populate the table PRO_T_TICKETS with my ETL Job?
I duplicated the step TABLE_OUTPUT to PRO_T_TICKETS and changed the Target table field to 'PRO_T_TICKETS2'. Pentaho created a new table with lowercase schema and populated the data in it.
But I want this data to be uploaded in the table PRO_T_TICKETS only and with the UPPERCASE schema if possible.
I am attaching the whole job here and the error thrown by Pentaho. Pentaho Error I have also tried my query by adding double quotes to the column names as you can see in the error. But it didn't help.
What do you think I should do?
When you create (or modify) the connection, select Advanced on the left panel and click on the Force to upper case or Force to lower case or, even better, Preserve case of reserved words.
To know which option to choose, copy the 4th line of your error log, the line starting with INSERT INTO "public"."PRO_T_TICKETS("OID"... in your SQL-developer tool and change the connection advanced parameters until it works.
Also, at debug time, don't use batch updates, don't use lazy conversion on previous steps, and try with one (1) field rather than all (25).
Just as a complement: it worked for me following the tips from AlainD and using specific configurations that I'd like to share with you. I have a transformation streaming data from MySQL to PostgreSQL using a Table Input and Output. In both of DBs I have uppercase objects.
I did the following steps to work in the right way:
In the table input (MySQL) the objects are uppercase too, but I typed in lowercase and it worked and I didn't set any special option in the DB Connection.
In the table output (PostgreSQL) I typed everything in uppercase (schema, table name and columns) and I also set "specify the database fields" (clicking on "Get fields").
In the target DB Connection (PostgreSQL) I put the options (in "Advanced" section): "Quote all in database" and "Preserve case of reserved words".
PS: Ah, the last option is because I've found out that there was one more problem with my fields: there was a column called "Admin" (yes guys, they created a camelcase column using a reserved word!) and for that reason I must to put "Preserve case of reserved words" and type it as "Admin" (without quotes and in camelcase) in the Table Output.

Suggest a database for key with multiple values , highly scalable

We have data with key-multipleValues. Each key can have around 500 values (each value will be around 200-300 chars) and the number of such keys will be around 10 million. Major operation is to check for a value given a key.
I've been using mysql for long time where i've got 2 options: one row for each keyvalue, one row for each key with all values in a text field.But these does not seem efficient to me as the first model has lot of rows,redundancies and second model text field will become very large .
I am considering using nosql database for this purpose, i've used mongodb before and i dont think it is suitable for my current case. keyvalue based or column family based nosql db would be better.It need not be distributed.Someone who used riak,redis,cassandra etc pls share your thoughts.
Thanks
From your description, it seems some sort of Key-value store will be better for you comparing relational DB.
The data itself seem to be a non-relational, why store in a relational storage? It seems valid to use something like Cassandra.
I think a typical data-structure for this data to store will be a column family, with Key as Row-key and Columns as value.
MyDATA: (ColumnFamily)
RowKey=>Key
Column1=>val1
Column2=>val2
...
...
ColumnN=valN
The data would look like (JSON notation):
MyDATA (CF){
[
{key1:[{val1-1:'', timestamp}, {val1-2:'', timestamp}, .., {val1-500:'', timestamp}]},
{key2:[{val2-1:'', timestamp}, {val2-2:'', timestamp}, .., {val2-500:'', timestamp}]},
...
...
]
}
Hopefully this helps.
Try the direct, normalized approach: One table with this schema:
id (primary key)
key
value
You have one row for every key->value relation
Add an index for each column, and lookup should be reasonably efficient. Have you profiled any of this to exhibit a bottleneck?
This does map straightforwardly to Cassandra. Row key will be your model key, and your model values will be column names (yes, names) in Cassandra. You can leave the Cassandra column value empty, or add metadata there such as timestamp if that would be useful.
I don't think this is beyond the scale of MySQL on a single machine. You'll need to tune inserts or it'll take forever to load. You might also consider compressing your values using COMPRESS() or in your app directly. Might save you 50% or so.
Redis is basically an in-memory database, so it's probably out. Riak might be a decent choice or HBase or Cassandra.