I am using json schema for defining and validating rest api input payloads using networknt schema validator.
Internally i have defined schemas and put all the common def in common schema and added local resource $ref to these defs whenever needed.
Now these schemas will be used by users to send proper payloads and i have defined get schema apis to return these schemas.
However when i get the schemas from netorknt schema validator it comes as its defined with $ref as is.
This is very inconvenient to users to retrieve multiple schemas and manually look at the defs in common schema to see how its defined.
I wanted to check if there is any way networknt schema validato provides any way to retrieve schemas with $ref resolved with the actual defs instead of $refs.
I am trying to avoid writing my own parser to resolve these $ref and replace with actual defs.
When running within the OpenAPI specification context, it only resolves the local references in the same file. Remote references should be handled in the openapi-bundler(https://github.com/networknt/openapi-bundler) as most servers don't have access to the Internet at all. And downloading schemas from the Internet during the runtime is risky.
I've developed a custom source connector for an external REST service.
I get JSONs, convert them to org.apache.kafka.connect.data.Struct with manually defined schema (SchemaBuilder) and wrap all this to SourceRecord.
All of this is for one entity only, but there a dozen of them.
My new goal is to make this connector universal and parametrize the schema. The idea is to get the schema as String (json) from configs or external files and pass it to SourceRecord, but it only accepts Schema objects.
Is there any simple/good ways to convert String/json to Schema or even pass String schema directly?
There is a JSON to Avro converter, however, if you are already building a Struct/Schema combination, then you shouldn't need to do anything, as the Converter classes in Kafka Connect can handle the conversion for you
For our SaaS API we use schema-based multitenancy, which means every customer (~tenant) has its own separate schema within the same (postgres) database, without interfering with other customers. Each schema consists of the same underlying entity-model.
Everytime a new customer is registered to the system, a new isolated schema is automatically created within the db. This means, the schema is created at runtime and not known in advance. The customer's schema is named according to the customer's domain.
For every request that arrives at our API, we extract the user's tenancy-affiliation from the JWT and determine which db-schema to use to perform the requested db-operations for this tenant.
After having established a connection to a (postgres) database via TypeORM (e.g. using createConnection), our only chance to set the schema for a db-operation is to resort to the createQueryBuilder:
const orders = await this.entityManager
.from(`${tenantId}.orders`, 'order') // <--- setting schema-prefix here
.where("order.priority = 4")
This means, we are forced to use the QueryBuilder as it does not seem to be possible to set the schema when working with the EntityManager API (or the Repository API).
However, we want/need to use these APIs, because they are much simpler to write, require less code and are also less error-prone, since they do not rely on writing queries "manually" employing a string-based syntax.
In case of TypeORM, is it possible to somehow set the db-schema when working with the EntityManager or repositories?
Something like this?
// set schema when instantiating manager
const manager = connection.createEntityManager({ schema: tenantDomain });
// should find all matching "order" entities within schema
const orders = manager.find(Order, { priority: 4 })
// should find a matching "item" entity within schema using same manager
const item = manager.findOne(Item, { id: 321 })
The db-schema needs to be set in a request-scoped way to avoid setting the schema for other requests, which may belong to other customers. Setting the schema for the whole connection is not an option.
We are aware that one could create a whole new connection and set the schema for this connection, but we want to reuse the existing connection. So simply creating a new connection to set the schema is not an option.
To answer my own question:
At the moment there is no way to instantiate TypeORM repositories with different schemas at runtime without creating new connections.
So the only two options that a developer is left with for schema-based multi tenancy are:
Setting up new connections to connect with different schemas within the same db at runtime. E.g. see NestJS Request Scoped Multitenancy for Multiple Databases. However, one should definitely strive for reusing connections and and be aware of connection limits.
Abandoning the idea of working with the RepositoryApi and reverting to using createQueryBuilder (or executing SQL queries via query()).
For further research, here are some TypeORM GitHub issues that track the idea of changing the schema for a existing connections or repositories at runtime (similar to what is requested in the OP):
Multi-tenant architecture using schema. #4786 proposes something like this.photoRepository.useSchema('customer1').find()
Handling of database schemas #3067 proposes something like getConnection().changeDefaultSchema('myschema')
Run-time change of schema #4473
Add an ability to set postgresql schema per call #2439
P.S. If TypeORM decides to support the idea discussed in the OP, I will try to update this answer.
Here is a global overview of the issues with schema-based multitenancy along with a complete walkthrough a Github repo for it.
Most of the time, you may want to use Postgres Row Security Policy instead. It gives most of the benefits of schema-based multitenancy (especially on developer experience), without the issues related to the multiplication of connections.
Since commenting does not work for me, here a hint from the documentation of NestJS:
I am not using NestJS but reading the docs at the moment to decide, if it's a fitting framework for us. We have an app where only some modules have multi tenancy with schema per tenant, so using TypeOrmModule.forRootAsync(dynamicCreatedDbConfig) might be an option for me too.
This may help you if you have an interceptor or middleware, which prepares the dynamicCreatedDbConfig data before...
I am working on datastore datasource for apache-spark based on spark datasource V2 api. I was able to implement using hard-coded single entity but couldn't generalize it. Either I need to infer entity schema and translate entity record into Spark Row or read entity record as json and let the user translate into scala product (datastore java client is REST based so the payload is being pulled as json). I could see "entity.properties" as json key-values from within IntelliJ debugger which includes everything I need (column name, value, type etc.) but I can't use entity.properties due to access restrictions. Appreciate any ideas.
fixed by switching to low level API https://github.com/GoogleCloudPlatform/google-cloud-datastore
full source for spark-datastore-connector https://github.com/sgireddy/spark-datastore-connector
We are considering a serialization approach for our scala-based Akka Persistence app. We consider it likely that our persisted events will "evolve" over time, so we want to support schema evolution, and are considering Avro first.
We'd like to avoid including the full schema with every message. However, for the foreseeable future, this Akka Persistence app is the only app that will be serializing and deserializing these messages, so we don't see a need for a separate schema registry.
Checking the docs for avro and the various scala libs, I see ways to include the schema with messages, and also how to use it "schema-less" by using a schema registry, but what about the in-between case? What's the correct approach for going schema-less, but somehow including an identifier to be able to look up the correct schema (available in the local deployed codebase) for the deserialized object? Would I literally just create a schema that represents my case class, but with an additional "identifier" field for schema version, and then have some sort of in-memory map of identifier->schema at runtime?
Also, is the correct approach to have one serializer/deserialize class for each version of the schema, so it knows how to translate every version to/from the most recent version?
Finally, are there recommendations on how to unit-test schema evolutions? For instance, store a message in akka-persistence, then actually change the definition of the case class, and then kill the actor and make sure it properly evolves. (I don't see how to change the definition of the case class at runtime.)
After spending more time on this, here are the answers I came up with.
Using avro4s, you can use the default data output stream to include the schema with every serialized message. Or, you can use the binary output stream, which simply omits the schema when serializing each message. ('binary' is a bit of a misnomer here since all it does is omit the schema. In either case it is still an Array[Byte].)
Akka itself supplies a Serializer trait or a SerializerWithStringManifest trait, which will automatically include a field for a "schema identifier" in the object of whatever you serialize.
So when you create your custom serializer, you can extend the appropriate trait, define your schema identifier, and use the binary output stream. When those techniques are combined, you'll successfully be using schema-less serialization while including a schema identifier.
One common technique is to "fingerprint" your schema - treat it as a string and then calculate its digest (MD5, SHA-256, whatever). If you construct an in-memory map of fingerprint to schema, that can serve as your application's in-memory schema registry.
So then when deserializing, your incoming object will have the schema identifier of the schema that was used to serialize it (the "writer"). While deserializing, you should know the identifier of the schema to use to deserialize it (the "reader"). Avro4s supports a way for you to specify both using a builder pattern, so avro can translate the object from the old format to the new. That's how you support "schema evolution". Because of how that works, you don't need a separate serializer for each schema version. Your custom serializer will know how to evolve your objects, because that's the part that Avro gives you for free.
As for unit testing, your best bet is exploratory testing. Actually define multiple versions of a case class in your test, and multiple accompanying versions of its schema, and then explore how Avro works by writing tests that will evolve an object between different versions of that schema.
Unfortunately that won't be directly relevant to the code you are writing, because it's hard to simulate actually changing the code you are testing as you test it.
I developed a prototype that demonstrates several of these answers, and it's available on github. It uses avro, avro4s, and akka persistence. For this one, I demonstrated a changing codebase by actually changing it across commits - you'd check out commit #1, run the code, then move to commit #2, etc. It runs against cassandra so it will demonstrate replaying events that need to be evolved using new schema, all without using an external schema registry.
I have classes for entities like Customer, InternalCustomer, ExternalCustomer (with the appropriate inheritance) generated from an xml schema. I would like to use JPA (suggest specific implementation in your answer if relevant) to persist objects from these classes but I can't annotate them since they are generated and when I change the schema and regenerate, the annotations will be wiped. Can this be done without using annotations or even a persistence.xml file?
Also is there a tool in which I can provide the classes (or schema) as input and have it give me the SQL statements to create the DB (or even create it for me?). It would seem like that since I have a schema all the information it needs about creating the DB should be in there. I am not talking about creating indexes, or any tuning of the db but just creating the right tables etc.
thanks in advance
You can certainly use JDO in such a situation, dynamically generating the classes, the metadata, any byte-code enhancement, and then runtime persistence, making use of the class loader where your classes have been generated in and enhanced. As per
JPA doesn't have such a metadata API unfortunately.
--Andy (DataNucleus)