Hi I create my own Connector of Cassandra using the datastax drivers. But I´m facing some Memory leaks issues, so I start considering another solutions like Alpakka de lightbend which has a Cassandra connector.
But after check the poor documentation I´m changing my mind, since it´s just using the connector with CQLSH queries, and in my case I manage DTO objects.
Anybody knows any documentation where I can see if Alpakka cassandra manage the save of DTO´s with consistency level?.
This code is from my current connector. I would like to achieve something similar.
private void updateCreateEntry(DTO originalDto, Mapper cassandraMapper) {
ConsistencyLevel consistencyLevel = ((DTOCassandra) originalDto).getConsistencyLevel();
//.- For writing we set the consistency level to quorum
cassandraMapper.save(originalDto, Option.consistencyLevel(consistencyLevel != null ? consistencyLevel : DEFAULT_CONSISTENCY_LEVEL));
}
As you've noticed, presently the Cassandra connector within Alpakka is quite thin. If you need a richer support for your DTO, you could choose a richer client like Phantom.
There are excellent examples on how to use Phantom - check this one out for instance. Once you have created you model, Phantom will give you a def store[T](t: T): Future[ResultSet] function to insert data.
You can feed calls to these function to a mapAsync(n) combinator to make use of them in your Akka Stream.
Related
I am trying to build a kakfa pipeline which will read JSON input data into Kafka topic.
I am using AVRO serialization with schema registry, as my schema gets changed on regular basis.
As of now GenericRecord is used to parse the schema.
But I recently came to know that avro-tools are available to read schema and generate Java classes which can be used to create Producer Code.
I am confused choose between these two options.
Can you please suggest me which one is better, as my schema gets frequently changed?
avro-tools are available to read schema and generate java classes which can be used to create Producer Code
They create specific Avro classes, not Producer code, but regarding the question. Both will work.
The way I see it
GenericRecord - Think of it as a HashMap<String, Object>. As a consumer need to know the fields to get. If, as a producer or schema creator, you are not able to send your classes as a library to your consumers, this is the essentially the best you can get. I believe you'll always be able to get the latest data, though (all possible fields can be accessed by a get("fieldname") call. See example here
SpecificRecord (what avro-tools generates) - It is just a generated class with getter methods and builder objects / setter methods. Any consumer will be able to import your producer classes as dependencies, deserialize the message, then immediately know what fields are available. You are not guaranteed to get the latest schema here - you will be "downgraded" and limited to whatever schema was used to generate those classes.
I use avro-maven-plugin to generally create the classes. Just as this example
You could also use AvroReflect to build an Avro schema from a Java class rather than the other way around. Annotations can be used on fields to set #Union or #AvroDefault settings.
Further Reading about using the Confluent Schema Registry
We are considering a serialization approach for our scala-based Akka Persistence app. We consider it likely that our persisted events will "evolve" over time, so we want to support schema evolution, and are considering Avro first.
We'd like to avoid including the full schema with every message. However, for the foreseeable future, this Akka Persistence app is the only app that will be serializing and deserializing these messages, so we don't see a need for a separate schema registry.
Checking the docs for avro and the various scala libs, I see ways to include the schema with messages, and also how to use it "schema-less" by using a schema registry, but what about the in-between case? What's the correct approach for going schema-less, but somehow including an identifier to be able to look up the correct schema (available in the local deployed codebase) for the deserialized object? Would I literally just create a schema that represents my case class, but with an additional "identifier" field for schema version, and then have some sort of in-memory map of identifier->schema at runtime?
Also, is the correct approach to have one serializer/deserialize class for each version of the schema, so it knows how to translate every version to/from the most recent version?
Finally, are there recommendations on how to unit-test schema evolutions? For instance, store a message in akka-persistence, then actually change the definition of the case class, and then kill the actor and make sure it properly evolves. (I don't see how to change the definition of the case class at runtime.)
After spending more time on this, here are the answers I came up with.
Using avro4s, you can use the default data output stream to include the schema with every serialized message. Or, you can use the binary output stream, which simply omits the schema when serializing each message. ('binary' is a bit of a misnomer here since all it does is omit the schema. In either case it is still an Array[Byte].)
Akka itself supplies a Serializer trait or a SerializerWithStringManifest trait, which will automatically include a field for a "schema identifier" in the object of whatever you serialize.
So when you create your custom serializer, you can extend the appropriate trait, define your schema identifier, and use the binary output stream. When those techniques are combined, you'll successfully be using schema-less serialization while including a schema identifier.
One common technique is to "fingerprint" your schema - treat it as a string and then calculate its digest (MD5, SHA-256, whatever). If you construct an in-memory map of fingerprint to schema, that can serve as your application's in-memory schema registry.
So then when deserializing, your incoming object will have the schema identifier of the schema that was used to serialize it (the "writer"). While deserializing, you should know the identifier of the schema to use to deserialize it (the "reader"). Avro4s supports a way for you to specify both using a builder pattern, so avro can translate the object from the old format to the new. That's how you support "schema evolution". Because of how that works, you don't need a separate serializer for each schema version. Your custom serializer will know how to evolve your objects, because that's the part that Avro gives you for free.
As for unit testing, your best bet is exploratory testing. Actually define multiple versions of a case class in your test, and multiple accompanying versions of its schema, and then explore how Avro works by writing tests that will evolve an object between different versions of that schema.
Unfortunately that won't be directly relevant to the code you are writing, because it's hard to simulate actually changing the code you are testing as you test it.
I developed a prototype that demonstrates several of these answers, and it's available on github. It uses avro, avro4s, and akka persistence. For this one, I demonstrated a changing codebase by actually changing it across commits - you'd check out commit #1, run the code, then move to commit #2, etc. It runs against cassandra so it will demonstrate replaying events that need to be evolved using new schema, all without using an external schema registry.
I'm using Akka Persistence, with LevelDB as storage plugin, in an application written in Scala. On the query-side, the current implementation uses PersistentView, which polls messages from a PersistentActor's journal by just knowing the identifier of the actor.
Now I've learned that PersistentView is deprecated, and one is encouraged to use Persistent Query instead. However, I haven't found any thorough description on how to adapt the code from using PersistentView to support the preferred Persistence Query implementation.
Any help would be appreciated!
From the 2.4.x-to-2.5.x migration guide:
Removal of PersistentView
After being deprecated for a long time, and replaced by Persistence Query PersistentView has now been removed.
The corresponding query type is EventsByPersistenceId. There are several alternatives for connecting the Source to an actor corresponding to a previous PersistentView actor which are documented in Integration.
The consuming actor may be a plain Actor or an PersistentActor if it needs to store its own state (e.g. fromSequenceNr offset).
Please note that Persistence Query is not experimental/may-change anymore in Akka 2.5.0, so you can safely upgrade to it.
Only able to insert simple objects into db using Confluent Kafka Connect. Not sure how to make this support complex json/schema structure. I am not sure whether this feature is avalable or not. There is a similar question here asked about a year ago, but not answered till now. Please help.
Kafka Connect does support complex structures, including Struct, Map, and Array. Generally only source connectors need to do this, as sink connectors are handed the values and simply need to use them. This documentation describes the basics of building up a Schema object that describes a Struct, and then creating a Struct instance that adheres to that schema. In this case, the example struct is just a flat structure.
However, you can easily add fields of type Struct that are defined with another Schema instance. In effect, it's just layering this simple pattern into multiple levels in your structs:
Schema addressSchema = SchemaBuilder.struct().name(ADDRESS)
.field("number", Schema.INT16_SCHEMA)
.field("street", Schema.STRING_SCHEMA)
.field("city", Schema.STRING_SCHEMA)
.build();
Schema personSchema = SchemaBuilder.struct().name(NAME)
.field("name", Schema.STRING_SCHEMA)
.field("age", Schema.INT8_SCHEMA)
.field("admin", new SchemaBuilder.boolean().defaultValue(false).build())
.field("address", addressSchema)
.build();
Struct addressStruct = new Struct(addressSchema)
.put("number", 100)
.put("street", "Main Street")
.put("city", "Springfield")
.build();
Struct personStruct = new Struct(personSchema)
.put("name", "Barbara Liskov")
.put("age", 75)
.put("address", addressStruct)
.build();
Because the SchemaBuilder is a fluent API, you can actually embed it just like the custom admin boolean schema builder. But that's a bit harder since you need to reference the Schema to create the addressStruct.
Generally you only have to worry about how to do this when writing a source connector. If you're trying to use an existing source connector, you likely have very little control over the structure of the keys and values. For example, Confluent's JDBC source connector is modeling each table with a separate Schema and each row in that table as a separate Struct that uses that schema. But since rows are flat, the Schema and Struct will only contain fields with primitive types.
Debezium's CDC connectors for MySQL and PostgreSQL also model a relational table with a Schema and correspond Struct objects for each row, but CDC captures more information about the row such as the state of the row before and/or after the change. Consequently, these connectors use a more complex Schema for each table that involve nested Struct objects.
Note that while each source connector will have their own flavor of message structures, Kafka Connect's Single Message Transforms (SMTs) to make it pretty easy to filter, rename, and make slight modifications to the messages produced by a source connector before they are written to Kafka, or to the messages read from Kafka before they are sent to a sink connector.
I'm using Phantom framework to work with Cassandra and I'm trying to do a eqs on a Option field eg.
Address.select.where(_.id eqs Some(uuid)).one()
Then I get "value eqs is not a member of object"
is there a way to accomplish that? I can't figure out...
The id field is an Option[UUID], because it must be null when I'm receiving a POST request in Play Framework, but I don't know how to do this assert in phantom
I also opened an issue on github.
https://github.com/websudos/phantom/issues/173
Phantom relies on a series of implicit conversions to provide most of the functionality. A very simple way to fix most of the errors you get from compiling phantom tables is to make sure the relevant import is in scope.
Before phantom 1.7.0
import com.websudos.phantom.Implicits._
After 1.7.0
import com.websudos.phantom.dsl._
Beyond the implicit mechanism, phantom will also help you with aliases to a vast number of useful objects in Cassandra:
Phantom connectors
Cassandra Consistency Levels
Keyspaces
Using a potentially null value as part of a PRIMARY KEY in CQL is also wrong as no part of the CQL primary can be null. It's a much better idea to move processing logic outside of phantom.
Traditionally, a tables -> db service -> api controller -> api approach is the way to build modular applications with better separation of concerns. It's best to keep simple I/O at table level, app level consistency at db service level, and all processing logic at a higher level.
Hope this helps.
Using the
import com.websudos.phantom.Implicits._
works!!!