Partial inserts with Cassandra and Phantom DSL - scala

I'm building a simple Scala Play app which stores data in a Cassandra DB using the Phantom DSL driver for Scala. One of the nice features of Cassandra is that you can do partial updates i.e. so long as you provide the key columns, you do not have to provide values for all the other columns in the table. Cassandra will merge the data into your existing record based on the key.
Unfortunately, it seems this doesn't work with Phantom DSL. I have a table with several columns, and I want to be able to do an update, specifying values just for the key and one of the data columns, and let Cassandra merge this into the record as usual, while leaving all the other data columns for that record unchanged.
But Phantom DSL overwrites existing columns with null if you don't specify values in your insert/update statement.
Does anybody know of a work-around for this? I don't want to have to read/write all the data columns every time, as eventually the data columns will be quite large.
FYI I'm using the same approach to my Phantom coding as in these examples:
https://github.com/thiagoandrade6/cassandra-phantom/blob/master/src/main/scala/com/cassandra/phantom/modeling/model/GenericSongsModel.scala

It would be great to see some code, but partial updates are possible with phantom. Phantom is an immutable builder, it will not override anything with null by default. If you don't specify a value it won't do anything about it.
database.table.update.where(_.id eqs id).update(_.bla setTo "newValue")
will produce a query where only the values you've explicitly set to something will be set to null. Please provide some code examples, your problem seems really strange as queries don't keep track of table columns to automatically add in what's missing.
Update
If you would like to delete column values, e.g set them to null inside Cassandra basically, phantom offers a different syntax which does the same thing:
database.table.delete(_.col1, _.col2).where(_.id eqs id)`
Furthermore, you can even delete map entries in the same fashion:
database.table.delete(_.props("test"), _.props("test2").where(_.id eqs id)
This assumes props is a MapColumn[Table, Record, String, _], as the props.apply(key: T) is typesafe, so it will respect the keytype you define for the map column.

Related

Selecting identical named columns in jOOQ

Im currently using jOOQ to build my SQL (with code generation via the mvn plugin).
Executing the created query is not done by jOOQ though (Using vert.X SqlClient for that).
Lets say I want to select all columns of two tables which share some identical column names. E.g. UserAccount(id,name,...) and Product(id,name,...). When executing the following code
val userTable = USER_ACCOUNT.`as`("u")
val productTable = PRODUCT.`as`("p")
create().select().from(userTable).join(productTable).on(userTable.ID.eq(productTable.AUTHOR_ID))
the build method query.getSQL(ParamType.NAMED) returns me a query like
SELECT "u"."id", "u"."name", ..., "p"."id", "p"."name", ... FROM ...
The problem here is, the resultset will contain the column id and name twice without the prefix "u." or "p.", so I can't map/parse it correctly.
Is there a way how I can say to jOOQ to alias these columns like the following without any further manual efforts ?
SELECT "u"."id" AS "u.id", "u"."name" AS "u.name", ..., "p"."id" AS "p.id", "p"."name" AS "p.name" ...
Im using the holy Postgres Database :)
EDIT: Current approach would be sth like
val productFields = productTable.fields().map { it.`as`(name("p.${it.name}")) }
val userFields = userTable.fields().map { it.`as`(name("p.${it.name}")) }
create().select(productFields,userFields,...)...
This feels really hacky though
How to correctly dereference tables from records
You should always use the column references that you passed to the query to dereference values from records in your result. If you didn't pass column references explicitly, then the ones from your generated table via Table.fields() are used.
In your code, that would correspond to:
userTable.NAME
productTable.NAME
So, in a resulting record, do this:
val rec = ...
rec[userTable.NAME]
rec[productTable.NAME]
Using Record.into(Table)
Since you seem to be projecting all the columns (do you really need all of them?) to the generated POJO classes, you can still do this intermediary step if you want:
val rec = ...
val userAccount: UserAccount = rec.into(userTable).into(UserAccount::class.java)
val product: Product = rec.into(productTable).into(Product::class.java)
Because the generated table has all the necessary meta data, it can decide which columns belong to it, and which ones don't. The POJO doesn't have this meta information, which is why it can't disambiguate the duplicate column names.
Using nested records
You can always use nested records directly in SQL as well in order to produce one of these 2 types:
Record2<Record[N], Record[N]> (e.g. using DSL.row(table.fields()))
Record2<UserAccountRecord, ProductRecord> (e.g using DSL.row(table.fields()).mapping(...), or starting from jOOQ 3.17 directly using a Table<R> as a SelectField<R>)
The second jOOQ 3.17 solution would look like this:
// Using an implicit join here, for convenience
create().select(productTable.userAccount(), productTable)
.from(productTable)
.fetch();
The above is using implicit joins, for additional convenience
Auto aliasing all columns
There are a ton of flavours that users could like to have when "auto-aliasing" columns in SQL. Any solution offered by jOOQ would be no better than the one you've already found, so if you still want to auto-alias all columns, then just do what you did.
But usually, the desire to auto-alias is a derived feature request from a misunderstanding of what's the best approch to do something in jOOQ (see above options), so ideally, you don't follow down the auto-aliasing road.

Fetching only one field using sorm framework

Is it possible to fetch only one field from the database using the SORM Framework?
What I want in plain SQL would be:
SELECT node_id FROM messages
I can't seem to be able to reproduce this in sorm. I know this might be against how sorm is supposed to work, but right now I have two huge tables with different kind of messages. I was asked to get all the unique node_ids from both tables.
I know I could just query both tables using sorm and parse through all the data but I would like to put the database to work. Obviously, this would be even better if one can get only unique node_ids in a single db call.
Right now with just querying everything and parsing it, it takes way too long.
There doesn't seem to be ORM support for what you want to do, unless node_id happens to be the primary key of your Message object:
val messages = Db.query[Message].fetchIds()
In this case you shouldn't need to worry about it being UNIQUE, since primary keys are by definition unique. Alternatively, you can run a custom SQL query:
Db.fetchWithSql[Message]("SELECT DISTINCT node_id FROM messages")
Note this latter might be typed wrong: you'd have to try it against your database. You might need fetchWithSql[Int], or some other variation: it is unclear what SORM does in the eventuality that the primary key hasn't been queried.

Talend Data Itegration: Avoid nulls coming out of tExtractXMLField?

I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.

Discard values while inserting and updating data using slick

I am using slick with play2.
I have multiple fields in the database which are managed by the database. I don't want to create or update them, however I want to get them while reading the values.
For example, suppose I have
case class MappedDummyTable(id: Int, .. 20 other fields, modified_time: Optional[Timestamp])
which maps Dummy in the database. modified_time is managed by the database.
The problem is during insert or update, I create an instance of MappedDummyTable without the modified time attribute and pass it to slick for create/update like
TableQuery[MappedDummyTable].insert(instanceOfMappedDummyTable)
For this, Slick creates query as
Insert INTO MappedDummyTable(id,....,modified_time) Values(1,....,null)
and updates the modified_time as NULL, which I don't want. I want Slick to ignore the fields while updating and creating.
For updating, I can do
TableQuery[MappedDummyTable].map(fieldsToBeUpdated).update(values)
but this leads to 20 odd fields in the map method which looks ugly.
Is there any better way?
Update:
The best solution that I found was using multiple projection. I created one projection to get the values and another to update and insert the data
maybe you need to write some triggers in table if you don't want to write code like row => (row.id,...other 20 fields)
or try use None instead of null?
I believe that the solution with mapping non-default field is the only way to do it with Slick. To make it less ugly you can define function ignoreDefaults on MappedDummyTable that will return only non default value and function in companion object to MappedDummyTable case class that returns projection
TableQuery[MappedDummyTable].map(MappedDummyTable.ignoreDefaults).insert(instanceOfMappedDummyTable.ignoreDefaults)

column Name of LongType instead of Utf8Type in cassandra cli column family

i am new at cassandra cli, i want to know is it is a good practice to define a columns name as LongType instead of Utf8type and also please tell me is there anything wrong in my code or coding style?
i am doing it in the scala in playframework with hector.
val mutator = HFactory.createMutator(Group, le);
mutator.addInsertion(groupId,"groupRefrence",HFactory.createColumn(userId,userId,le,le))
mutator.execute()
def getMembersRefrence(groupId: Long) = {
val sliceQuery = HFactory.createSliceQuery(Group, le, le, le)
sliceQuery.setColumnFamily("groupRefrence")
sliceQuery.setKey(groupId)
sliceQuery.setRange(Long.MIN_VALUE,Long.MAX_VALUE, false, Integer.MAX_VALUE)
val result = sliceQuery.execute()
val res = result.get()
val columns = res.getColumns()
val response = columns.toList
response
}
good practice to define a columns name as LongType instead of Utf8type
You should define your column name datatype to whatever makes sense for your data model. As far as best practices go, eBay posted a tech blog on this a couple of years ago, and it is definitely a good read. Part 2 covers column names:
Storing values in column names is perfectly OK
Leaving column values empty (“valueless” columns) is also OK.
It’s a common practice with Cassandra to store a value (actual data)
in the column name (a.k.a. column key), and even to leave the column
value field empty if there is nothing else to store. One motivation
for this practice is that column names are stored physically sorted,
but column values are not.
Notes:
The maximum column key (and row key) size is 64KB. However, don’t
store something like ‘item description’ as the column key!
Don’t use
timestamp alone as a column key. You might get colliding timestamps
from two or more app servers writing to Cassandra. Prefer timeuuid
(type-1 uuid) instead.
The maximum column value size is 2 GB. But
becuase there is no streaming and the whole value is fetched in heap
memory when requested, limit the size to only a few MBs. (Large
objects are not likely to be supported in the near future –
Cassandra-265. However, the Astyanax client library supports large
objects by chunking them.)
I also feel compelled to mention that newer versions of Cassandra are moving away from the original column family and cli interaction. I'm not sure if the newer CQL3 drivers support storing values in column names or not (I've also had to do it in via Thrift with Hector, but not CQL3). In any case, here is a good article (A thrift to CQL3 upgrade guide) that describes these differences, and it is something you should read through for future endeavors.