Cassandra Super Column Family Schema Creation

Cassandra Super Column Family Schema Creation - nosql

I am trying to create a Super Column Family that will replicate a structure like this.
{ 'hd':
'2008/12/12 10:03': { metric1: 'blah', metric2: 'blah'}
'2008/12/2 9:03': { metric1: 'blah', metric2: 'blah'}
'cpu':
'2008/12/12 10:03': { metric1: 'blah', metric2: 'blah'}
'2008/12/2 9:03': { metric1: 'blah', metric2: 'blah'}
}
My current try looks like this:
create column family Timestep
with column_type = 'Super'
and comparator = 'AsciiType'
and subcomparator = 'DateType'
and default_validation_class = 'DoubleType'
and key_validation_class = 'AsciiType'
and column_metadata = [
{column_name : metric1, validation_class : DoubleType}
{column_name : metric2, validation_class : DoubleType}
];
But if I try and run the above in the cassandra-cli:
java.lang.RuntimeException: org.apache.cassandra.db.marshal.MarshalException: unable to coerce 'open' to a formatted date (long)
Maybe I am not understanding what a super column family is properly, but any help would be awesome.
Thanks.

It is very strongly recommended that you not use supercolumns, especially in new design. They have never been problem-free, and now they are deprecated and much more capably replaced by composite keys.
Your data could be nicely represented like this in CQL 3, for example:
CREATE TABLE Timestep (
hardware ascii,
when timestamp,
metric1 double,
metric2 double,
PRIMARY KEY (hardware, when)
);
Or, depending on exactly what you expect to have, it may make more sense to use:
CREATE TABLE Timestep (
hardware ascii,
metricname ascii,
when timestamp,
value double,
PRIMARY KEY (hardware, metricname, when)
) WITH COMPACT STORAGE;
See this article for more information on how these translate to storage engine wide rows in Cassandra.

May I know which API u are using?
If it is Hector, then I might help you. But i personally
recommend you not to use super column because, getting sub columnlist
from a super column is a headache. Plus there are lots of performance related issues.
Moreover super columns are getting deprecated. So better go with composite keys.

Related

How to get correct type and nullability information for enum fields using jOOQ's metadata API?

I'm trying to use jOOQ's metadata API, and most columns behave the way I'd expect, but enum columns seem to be missing type and nullability information somehow.
For example, if I have a schema defined as:
CREATE TYPE public.my_enum AS ENUM (
'foo',
'bar',
'baz'
);
CREATE TABLE public.my_table (
id bigint NOT NULL,
created_at timestamp with time zone DEFAULT now() NOT NULL,
name text,
my_enum_column public.my_enum NOT NULL,
);
The following test passes:
// this is Kotlin, but hopefully pretty easy to decipher
test("something fishy going on here") {
val jooq = DSL.using(myDataSource, SQLDialect.POSTGRES)
val myTable = jooq.meta().tables.find { it.name == "my_table" }!!
// This looks right...
val createdAt = myTable.field("created_at")!!
createdAt.dataType.nullability() shouldBe Nullability.NOT_NULL
createdAt.dataType.typeName shouldBe "timestamp with time zone"
// ...but none of this seems right
val myEnumField = myTable.field("my_enum_column")!!
myEnumField.dataType.typeName shouldBe "other"
myEnumField.dataType.nullability() shouldBe Nullability.DEFAULT
myEnumField.dataType.castTypeName shouldBe "other"
myEnumField.type shouldBe Any::class.java
}
It's telling me that enum columns have Nullability.DEFAULT regardless of whether they are null or not null. For other types, Field.dataType.nullability will vary depending on whether the column is null or not null, as expected.
For any enum column, the type is Object (Any in Kotlin), and the dataType.typeName is "other". For non-enum columns, dataType.typeName gives me the correct SQL for the type.
I'm also using the jOOQ code generator, and it generates the correct types for enum columns. That is, it creates an enum class and uses that as the type for the corresponding fields, which are marked as not-nullable. The generated code for this field looks something like (reformatted to avoid long lines):
public final TableField<MyTableRecord, MyEnum> MY_ENUM_COLUMN =
createField(
DSL.name("my_enum_column"),
SQLDataType.VARCHAR
.nullable(false)
.asEnumDataType(com.example.schema.enums.MyEnum.class),
this,
""
)
So it appears that jOOQ's code generator has the type information, but how can I access the type information via the metadata API?
I'm using postgres:11-alpine and org.jooq:jooq:3.14.11.
Update 1
I tried testing this with org.jooq:jooq:3.16.10 and org.jooq:jooq:3.17.4. They seem to fix the nullability issue, but the datatype is still "other", and the type is still Object. So it appears the nullability issue was a bug in jOOQ. I'll file an issue about the type+datatype.
Update 2
This is looking like it may be a bug, so I've filed an issue.

Custom types not found in Postgresql with psycopg2

Introduction:
I'm trying to insert data with Python/psycopg2 into Postgres in the following format:
(integer, date, integer, customtype[], customtype[], customtype[], customtype[])
However, as I try to insert them, I always get this error:
'"customtype[]" does not exist'
How is my setup:
I have a dict with the data I need, like so:
data_dict = {'integer1':1, 'date': datetime(),
'integer2': 2, 'custom1':[(str, double, double),(str, double, double)],
'custom2':[(str, double, double),(str, double, double),(str, double, double)],
'custom3':[(str, double, double),(str, double, double),(str, double, double)],
'custom4':[(str, double, double)]}
Each custom array can have as many custom tuples as needed.
I've already created a type for these custom tuples, as such:
"CREATE TYPE customtype AS (text, double precision, double precision)"
And I've created a table with columns of customtype[].
What I've tried so far:
query = """INSERT INTO table (column names...) VALUES
(%(integer1)s, %(date)s, %(integer2)s,
%(custom1)s::customtype[], [...]);"""
And:
query = """INSERT INTO table (column names...) VALUES
(%(integer1)s, %(date)s, %(integer2)s,
CAST(%(custom1)s AS customtype[]), [...]);"""
But both options render the same results.
The final question:
How to insert these record-type arrays in Postgresql with Psycopg2?
Maybe I'm probably misunderstanding completely how Postgresql works. I'm comming from a BigQuery Record/Repeated type background.
Ps.: This is how I'm querying:
cursor.execute(query,data_dict)

The problem is that I created the type inside the Database.
When referencing custom types in Postgresql, there's a need to reference the database where the type was created as well as the type.
Like so:
(%(dict_name)s)::"database_name".type
#or
CAST(%(dict_name)s as "database_name".type)
Be carefull with the quoting!

This is a bit of an old question and it set me on the right track to finding the answer but didn't quite get me all the way.
I had a function which needed to return a custom type. I needed to attach public before the custom type.
RETURNS TABLE (
"status" "public"."my_custom_status_type",
"ID" VARCHAR(255)
)
So directly answering OP's question, try adding public before the type.
query = """INSERT INTO table (column names...) VALUES
(%(integer1)s, %(date)s, %(integer2)s,
CAST(%(custom1)s AS public.customtype[]), [...]);"""

Query with casting in WHERE

I'm learning this wonderful library, however while simple queries work, I'm confused how to write something that not in library FAQ.
For example,
create table if not exists ticks
(id bigserial not null constraint ticks_pkey primary key,
timestamp timestamp not null
);
It it possible to write something like
select coalesce(max(id), 0) from ticks where timestamp::date = ?
Actually, I have 2 issues here
column.max() doesn't have any suitable modifiers, for example, function() accepts no parameters. Probably, I can emulate this in code after I fetch the row.
I have no idea how to make casting in where or write arbitrary where condition.

If it's possible to map object to your existing table then you could try something like:
object Ticks : LongIdTable() {
val timestamp = datetime("timestamp ")
}
fun Expression<DateTime>.pgDate() = object : org.jetbrains.exposed.sql.Function<DateTime>(DateColumnType(false)) {
override fun toQueryBuilder(queryBuilder: QueryBuilder) = queryBuilder {
append(this#pgDate, "::date")
}
}
val expr = Coalesce(Ticks.id.max(), longLiteral(0))
Ticks.slice(expr).select {
Ticks.timestamp.pgDate() eq DateTime.parse("2019-01-01")
}

Anorm Scala insert list of objects with nested list

I find myself in need of inserting a sequence of elements with a sequence of nested elements into a PostgreSQL database, preferably with a single statement, because I am returning a Future. I am using Scala Play with Anorm.
My data looks something like below.
case class Question(id: Long, titel: String)
case class Answer(questionId: Long, text: String)
In db it looks like this:
CREATE TABLE questions (
question_id SERIAL PRIMARY KEY NOT NULL,
titel TEXT NOT NULL,
);
CREATE TABLE answers (
answer_id SERIAL PRIMARY KEY NOT NULL,
question_id INT NOT NULL,
text TEXT NOT NULL,
FOREIGN KEY (question_id) REFERENCES questions(question_id) ON DELETE CASCADE
);
My function would look something like this:
def saveFormQuestions(questions: Seq[Question], answers: Seq[Answer]): Future[Long] = {
Future {
db.withConnection{ implicit c =>
SQL(
// sql
).executeInsert()
}
}
}
Somehow, in Anorm, SQL or both, I have to do the following, preferably in a single transaction:
foreach question in questions
insert question into questions
foreach answer in answers, where answer.questionId == old question.id
insert answer into answers with new question id gained from question insert
I am new with Scala Play, so I might have made some assumptions I shouldn't have. Any ideas to get me started would be appreciated.

I solved it with logic inside the db.withConnection block. Somehow I assumed that you had to have a single SQL statement inside db.withConnection, which turned out not to be true. So like this:
val idMap = scala.collection.mutable.Map[Long,Long]() // structure to hold map of old ids to new
db.withConnection { implicit conn =>
// save all questions and gather map of the new ids to the old
for (q <- questions) {
val id: Long = SQL("INSERT INTO questions (titel) VALUES ({titel})")
.on('titel -> q.titel)
.executeInsert(scalar[Long].single)
idMap(q.id) = id
}
// save answers with new question ids
if (answers.nonEmpty) {
for (a <- answers ) {
SQL("INSERT INTO answers (question_id, text) VALUES ({qid}, {text});")
.on('qid -> idMap(a.questionId), 'text -> a.text).execute()
}
}
}

As indicated by its name, Anorm is not an ORM, and won't generate the statement for you.
You will have to determine the statements appropriate the represent the data and relationships (e.g. my Acolyte tutorial).
As for transaction, Anorm is a thin/smart wrapper around JDBC, so JDBC transaction semantic is keep. BTW Play provides .withTransaction on its DB resolution utility.

Concatenating databases with Squeryl

I'm trying to use Squeryl to take the contents of a table from one database, and append it to the equivalent table in another database. The primary key will have to be reassigned in the process, but I'm getting the error NULL not allowed for column "SIMID". Why is this?
object Concatenator {
def main(args: Array[String]) {
Class.forName("org.h2.Driver");
val seshA = Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:file:data/resultsA", "sa", "password"),
new H2Adapter
)
val seshB = Session.create(
java.sql.DriverManager.getConnection("jdbc:h2:file:data/resultsB", "sa", "password"),
new H2Adapter
)
using(seshA){
import Library._
from(sims){s => select(s)}.foreach{item =>
using(seshB){
sims.insert(item);
}
}
}
}
case class Simulation(
#Column("SIMID")
var id: Long,
val date: Date
) extends KeyedEntity[Long]
object Library extends Schema {
val sims = table[Simulation]
on(sims)(s => declare(
s.id is(unique, indexed, autoIncremented)
))
}
}
Update:
I think it might be something to do with the DBs. They were created in a Java project using JPA/EclipseLink and in additional to generating tables for my entities it also created a table called SEQUENCE, presumably for primary key generation.
I've found that I can create an brand new table in Squeryl and manually put the contents of both databases in that, thus achieving the same effect. Interestingly this new table did not have any SEQUENCE table auto generated. So I'm guessing it comes down to how JPA/EclipseLink was generating my primary keys?
Update 2:
As requested, I appended trace_level_file=3 to the url and the files are here: resultsA.trace.db and resultsB.trace.db. B is the more interesting one I think. Also, I've put a simplified version of the database here which has had unnecessary tables removed (the same database is used for resultsA and resultsB).

Just got a moment to look at this more closely. I turns out you were on the right track. While I guess that EclipseLink uses Sequences to generate the PK value, Squeryl defines the column as something like:
simid bigint not null primary key auto_increment
Without the auto_increment flag a value is never placed in the column and you end up with the constraint violation you mentioned. It sounds like you've already worked around the issue, but hopefully this will help you or someone else in the future.

Not really a solution, but my workaround is to create a new database
val seshNew = Session.create(java.sql.DriverManager.getConnection("jdbc:h2:file:data/resultsNew", "sa","password"),new H2Adapter)
and then just write all the data from the other databases into it
using(seshNew){
sims.insert(new Simulation(0,item.date))
}
The primary keys 0 gets overwritten as appropriate.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Cassandra Super Column Family Schema Creation - nosql

Related

How to get correct type and nullability information for enum fields using jOOQ's metadata API?

Custom types not found in Postgresql with psycopg2

Query with casting in WHERE

Anorm Scala insert list of objects with nested list

Concatenating databases with Squeryl

Categories

Resources