I'm new on Cassandra and Scala, I'm working on a Kafka consumer (written in Scala) that has to update a field of a row on Cassandra from data it receives.
And so far no problem.
In this row a field is a String list and when I do the update this field hasn't to change, so I have to assign the same String list to it self.
UPDATE keyspaceName.tableName
SET fieldToChange = newValue
WHERE id = idValue
AND fieldA = '${currentRow.getString("fieldA")}'
AND fieldB = ${currentRow.getInt("fieldB")}
...
AND fieldX = ${currentRow.getList("fieldX", classOf[String]).toString}
...
But I receive even the exception:
com.datastax.driver.core.exceptions.SyntaxError: line 19:49 no viable alternative at input ']' (... 482 AND fieldX = [[listStringItem1]]...)
I currently haven't found anything that could help me through the web
The problem is that Scala's string representation of the list doesn't match to the Cassandra's representation of the list, so it generates errors.
Instead of constructing the CQL statement directly in your code, it's better to use PreparedStatement and bind variables to it:
first, it will speedup the execution as Cassandra won't parse every statement separately;
it will be easier to bind variables as you won't need to care about corresponding string representation
But be very careful with Scala - Java driver expects Java's lists, sets, maps, and base types, like, ints, etc.. You may look to java-driver-scala-extras package, but you'll need to compile it yourself, as it's not available on Maven Central.
Related
Im currently using jOOQ to build my SQL (with code generation via the mvn plugin).
Executing the created query is not done by jOOQ though (Using vert.X SqlClient for that).
Lets say I want to select all columns of two tables which share some identical column names. E.g. UserAccount(id,name,...) and Product(id,name,...). When executing the following code
val userTable = USER_ACCOUNT.`as`("u")
val productTable = PRODUCT.`as`("p")
create().select().from(userTable).join(productTable).on(userTable.ID.eq(productTable.AUTHOR_ID))
the build method query.getSQL(ParamType.NAMED) returns me a query like
SELECT "u"."id", "u"."name", ..., "p"."id", "p"."name", ... FROM ...
The problem here is, the resultset will contain the column id and name twice without the prefix "u." or "p.", so I can't map/parse it correctly.
Is there a way how I can say to jOOQ to alias these columns like the following without any further manual efforts ?
SELECT "u"."id" AS "u.id", "u"."name" AS "u.name", ..., "p"."id" AS "p.id", "p"."name" AS "p.name" ...
Im using the holy Postgres Database :)
EDIT: Current approach would be sth like
val productFields = productTable.fields().map { it.`as`(name("p.${it.name}")) }
val userFields = userTable.fields().map { it.`as`(name("p.${it.name}")) }
create().select(productFields,userFields,...)...
This feels really hacky though
How to correctly dereference tables from records
You should always use the column references that you passed to the query to dereference values from records in your result. If you didn't pass column references explicitly, then the ones from your generated table via Table.fields() are used.
In your code, that would correspond to:
userTable.NAME
productTable.NAME
So, in a resulting record, do this:
val rec = ...
rec[userTable.NAME]
rec[productTable.NAME]
Using Record.into(Table)
Since you seem to be projecting all the columns (do you really need all of them?) to the generated POJO classes, you can still do this intermediary step if you want:
val rec = ...
val userAccount: UserAccount = rec.into(userTable).into(UserAccount::class.java)
val product: Product = rec.into(productTable).into(Product::class.java)
Because the generated table has all the necessary meta data, it can decide which columns belong to it, and which ones don't. The POJO doesn't have this meta information, which is why it can't disambiguate the duplicate column names.
Using nested records
You can always use nested records directly in SQL as well in order to produce one of these 2 types:
Record2<Record[N], Record[N]> (e.g. using DSL.row(table.fields()))
Record2<UserAccountRecord, ProductRecord> (e.g using DSL.row(table.fields()).mapping(...), or starting from jOOQ 3.17 directly using a Table<R> as a SelectField<R>)
The second jOOQ 3.17 solution would look like this:
// Using an implicit join here, for convenience
create().select(productTable.userAccount(), productTable)
.from(productTable)
.fetch();
The above is using implicit joins, for additional convenience
Auto aliasing all columns
There are a ton of flavours that users could like to have when "auto-aliasing" columns in SQL. Any solution offered by jOOQ would be no better than the one you've already found, so if you still want to auto-alias all columns, then just do what you did.
But usually, the desire to auto-alias is a derived feature request from a misunderstanding of what's the best approch to do something in jOOQ (see above options), so ideally, you don't follow down the auto-aliasing road.
quoteEntitiesPage = quoteRepository.findAllByQuoteIds(quoteIds, pageRequest);
The above query gives me the error "Tried to send an out-of-range integer as a 2-byte value" if the count of quoteIds parameter is above Short.MAX_VALUE.
What is the best approach to get all quote entities here? My Quote class has id(long) and quoteId(UUID) fields.
When using a query of the type "select ... where x in (list)", such as yours, Spring adds a bind parameter for each list element. PostgreSQL limits the number of bind parameters in a query to Short.MAX_VALUE bind, so when the list is longer than that, you get that exception.
A simple solution for this problem would be to partition the list in blocks, query for each one of them, and combine the results.
Something like this, using Guava:
List<QuoteEntity> result = new ArrayList<>();
List<List<Long>> partitionedQuoteIds = Lists.partition(quoteIds, 10000);
for (List<Long> partitionQuoteIds: partitionedQuoteIds) {
result.addAll(quoteRepository.findAllByQuoteIds(partitionQuoteIds))
}
This is very wasteful when paginating, but it might be enough for your use case.
Since I'm kinda really stumped right now with this issue I thought I'd ask here.
So here's the problem. I'm currently trying to write a Library to represent Avro Schemas in a typesafe manner that should then later allow to structurally query a given runtime value of a schema. E.g. Does my schema contain a field of a given name within a certain path? Is the schema flat (contains no nestable types except at top level)? etc.
You can find the complete specification of Avro schemas here: https://avro.apache.org/docs/1.8.2/spec.html
Now I have some troubles deciding on a representation of the schema within my code. Right now I'm using an ADT like this because it makes decoding the AvroSchema (which is JSON) really easy with Circe so you can somewhat ignore things like the Refined Types for this issue.
https://gist.github.com/GrafBlutwurst/681e365ecbb0ecad2acf4044142503a9 Please note that this is not the exact implementation. I have one that is able to decode schemas correctly but is a pain to query afterwards.
Anyhow I was wondering:
1) Does anyone have a good Idea how to encode the Typerestriction on AVRO Union. Avro Unions cannot contain other Unions directly, but can for example contain Records which then again can contain Unions. So union -> union is not allowed but union -> record -> union is ok.
2) would using fixpoint recursion in form of Fix, Free and CoFree make the querying later easier? I'm somewhat on the fence since I have no experience using these yet.
Thanks!
PS: Here's some more elaboration on why Refined is in there. In the end I want to enable some very specific uses eg this pseudocode (I'm not quite sure if it is at all possible yet?:
refine[Schema Refined IsFlat](schema) //because it's flat I know it can only be a Recordtype with Fields of Primitives or Optionals (encoded as Union [null, primitive])
.folder { //wonky name
case AvroInt(k, i) => k + " -> " + i.toString
case AvroString(k, s) => k + " -> " + s
//etc...
} // should result in a function List[Vector[Byte]] => Either[Error,List[String]]
Basically given a schema and assuming it satisfies the IsFlat constraint, provide a function that decodes records and convert them into string lists.
I have grouped all my customers in JavaPairRDD<Long, Iterable<ProductBean>> by there customerId (of Long type). Means every customerId have a List or ProductBean.
Now i want to save all ProductBean to DB irrespective of customerId. I got all values by using method
JavaRDD<Iterable<ProductBean>> values = custGroupRDD.values();
Now i want to convert JavaRDD<Iterable<ProductBean>> to JavaRDD<Object, BSONObject> so that i can save it to Mongo. Remember, every BSONObject is made of Single ProductBean.
I am not getting any idea of how to do this in Spark, i mean which Spark's Transformation is used to do that job. I think this task is some kind of seperate all values from Iterable. Please let me know how is this possible.
Any hint in Scala or Python are also ok.
You can use the flatMapValues function:
JavaRDD<Object,ProductBean> result = custGroupRDD.flatMapValues(v -> v)
i am new at cassandra cli, i want to know is it is a good practice to define a columns name as LongType instead of Utf8type and also please tell me is there anything wrong in my code or coding style?
i am doing it in the scala in playframework with hector.
val mutator = HFactory.createMutator(Group, le);
mutator.addInsertion(groupId,"groupRefrence",HFactory.createColumn(userId,userId,le,le))
mutator.execute()
def getMembersRefrence(groupId: Long) = {
val sliceQuery = HFactory.createSliceQuery(Group, le, le, le)
sliceQuery.setColumnFamily("groupRefrence")
sliceQuery.setKey(groupId)
sliceQuery.setRange(Long.MIN_VALUE,Long.MAX_VALUE, false, Integer.MAX_VALUE)
val result = sliceQuery.execute()
val res = result.get()
val columns = res.getColumns()
val response = columns.toList
response
}
good practice to define a columns name as LongType instead of Utf8type
You should define your column name datatype to whatever makes sense for your data model. As far as best practices go, eBay posted a tech blog on this a couple of years ago, and it is definitely a good read. Part 2 covers column names:
Storing values in column names is perfectly OK
Leaving column values empty (“valueless” columns) is also OK.
It’s a common practice with Cassandra to store a value (actual data)
in the column name (a.k.a. column key), and even to leave the column
value field empty if there is nothing else to store. One motivation
for this practice is that column names are stored physically sorted,
but column values are not.
Notes:
The maximum column key (and row key) size is 64KB. However, don’t
store something like ‘item description’ as the column key!
Don’t use
timestamp alone as a column key. You might get colliding timestamps
from two or more app servers writing to Cassandra. Prefer timeuuid
(type-1 uuid) instead.
The maximum column value size is 2 GB. But
becuase there is no streaming and the whole value is fetched in heap
memory when requested, limit the size to only a few MBs. (Large
objects are not likely to be supported in the near future –
Cassandra-265. However, the Astyanax client library supports large
objects by chunking them.)
I also feel compelled to mention that newer versions of Cassandra are moving away from the original column family and cli interaction. I'm not sure if the newer CQL3 drivers support storing values in column names or not (I've also had to do it in via Thrift with Hector, but not CQL3). In any case, here is a good article (A thrift to CQL3 upgrade guide) that describes these differences, and it is something you should read through for future endeavors.