are schemas really shared across instances? - orientdb

In my tests, I'm spawning up ~100 unique OrientDB instances, all in different plocal locations. However, I'm getting exceptions when creating the schema in about ~10% of those databases due to "class already exists" errors.
Should this work? Is there something extra I need to do to force a distinct schema per unique plocal instance?
Most importantly, is there a workaround? (e.g. unique JVM per test, try/catch schema generation, etc...)
I raised this ticket as I think this is a bug https://github.com/orientechnologies/orientdb/issues/5490 but I could be wrong.
NOTE: I seem to be able to workaround this by ensuring that each integration test runs in a separate JVM. Also, I did a full clean of my entire project which may have cleared out any OrientDB disk caches that were created.

This is a feature, not a bug :-)

Related

How to query axon aggregates

Is there a way to see the current state of the aggregates stored in axon?
Our application uses a Oracle backed axon event store.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.
Is there a way to see the current state of the aggregates stored in axon?
In short, yes, although it is not recommended. Granted, if you are planning to employ CQRS. CQRS, or Command-Query Responsibility Separation, dictates that the Command Model and the Query Model are separate.
The aggregate support Axon delivers supplies an easy means to construct a Command Model. As the name suggests, it's intended for commands. On the flip side, you have Query Models, which are designed for queries. AxonIQ has this to say on CQRS; maybe that clarifies some things.
I tried querying the domainevententry and snapshotevententry tables, but they are empty.
That's interesting on its own account! When you publish events in Axon, either through the AggregateLifecycle#apply(Object...) or EventGateway#publish(Object...) method, the published event should end up in your domain_event_entry table. If that's not the case, then either your JPA/JDBC configuration has a misser or some other exceptions occurring in your application.
Would you be able to update your issue with samples of your configuration and/or stack traces that you are seeing?
Replaying production issues locally
What I've done in the past to be able to replay behavior occurring in a production environment is by loading the Aggregate's event stream from that environment into a local dev/test event store. To be able to query this, you only need the aggregate identifier. As the aggregate identifier is indexed, retrieving all events for a specific aggregate (differently named, the aggregate stream) is straightforward.
By doing so, I could run the application locally to flow through the aggregate step-by-step. This gave the benefit of knowing exactly which event caused what state change, leading to the problematic scenario.
However, why your events are not present in your domainevententry is unclear to me. If you're still facing issues with that, I still recommend that you update the question with more specifics on your project.

Multi Tenant vs Single Tenant?

I am about to build a SAAS product using Rail and Postgres. I would like to know if I should follow schema level, sub-domain based multi tenancy or a single tenant application is good enough Architecture?
My requirement has no dependability of data between clients hence schema based multi tenant architecture seems right to me. Could anyone please explain me further why it is good or bad with relevant explanation?
Here's a post from the creators of the Apartment gem suggesting they would not use schema-per-tenant approach in future.
The end result of the above mentioned problems have caused us to mostly abandon our separate schemas approach to multi-tenancy. For all services we build going forward, we use a more traditional column scoped approach and have written our own wrappers that effectively mimic the per-request tenanting approach that Apartment gave us.
If you are deploying to Heroku, there is a warning about schema-per-tenant affecting performance of the managed backup tool:
The most common use case for using multiple schemas in a database is building a software-as-a-service application wherein each customer has their own schema. While this technique seems compelling, we strongly recommend against it as it has caused numerous cases of operational problems. For instance, even a moderate number of schemas (> 50) can severely impact the performance of Heroku’s database snapshots tool, PG Backups.
For maximum data segregation a database-per-tenant approach is appropriate.
For simplest operations, a tenant_id column per table can be used to scope your queries, and can be enforced with row level security policies.

Is it mandatory to run Database Designer for every schema in HP Vertica?

Constantly i have been hitting with Resource pool allocation error after creating several tables in new schema.
After running the Database Designer in vertica for particular schema with all tables the queries are running fine.
Kindly help me to understand the concept.
The Database Designer is optional; you don't have to use it at all. Using it helps you optimize your physical layout, and if you're having trouble with resource-pool allocation it sounds like you might benefit from that.
From the documentation:
The HP Vertica Database Designer:
Analyzes your logical schema, sample data, and, optionally, your sample queries.
Creates a physical schema design (a set of projections) that can be deployed automatically or manually.
Can be used by anyone without specialized database knowledge.
Can be run and rerun any time for additional optimization without stopping the database.
Uses strategies to provide optimal query performance and data compression.
You can run DBD for just a particular query (optimizes whatever's needed to support that query) or for your entire database. It uses sample queries that you provide, so if your usage patterns change over time it can help to rerun it.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex

Quartz JDBC Job Store - Maintenance/Cleanup

I am currently in the processes of setting up Quartz in a load balanced environment using the JDBC job store and I am wondering how everyone manages the quartz job store DB.
For me Quartz (2.2.0) will be deployed as a part of a versioned application with multiple versions potentially existing on the one server at the one time. I am using the notation XXScheduler_v1 to ensure multiple schedulers play nice together. My code is working fine, with the quartz tables being populated with the triggers/jobs/etc as appropriate.
One thing I have noticed though is that there seems to be no database cleanup that occurs when the application is undeployed. What I mean is that the Job/Scheduler data seems to stay in the quartz database even though there is no longer a scheduler active.
This is less than ideal and I can imagine with my model the database would get larger than it needed to be with time. Am I missing how to hook-up some clean-up processes? Or does quartz expect us to do the db cleanup manually?
Cheers!
I got this issue once, and here is what I did to rectify the issue. This will work for sure but in case it does not then we will have backup of table so you don't have anything to loose while trying this.
Take sql dump of following tables using method mentioned at : Taking backup of single table
a) QRTZ_CRON_TRIGGERS
b) QRTZ_SIMPLE_TRIGGERS
c) QRTZ_TRIGGERS
d) QRTZ_JOB_DETAILS
Delete data from above tables in sequence as
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_DETAILS;
Restart your app which will then freshly insert all deleted tasks and related entries in above tables (Provided your app has its logic right).
This is more like starting your app with all the tasks being scheduled for the first time. So you must keep in mind that tasks will behave as if these are freshly inserted.
NOTE: If this does not work then apply the backup you took for tables and try to debug more closely. As of now, I have not seen this method fail.
It's definitely not doing any DB cleanup when undeploying the application or shutting down the scheduler. You would have to build some cleanup code during application shutdown (i.e. building some sort of StartupServlet or context listener that would do the cleanup on the destroy() event lifecycle)
You're not missing anything.
However, these quartz tables aren't different from any applicative DB objects you use in you data model. You add Employees table and in a later version you don't need it anymore. Who's responsible for deleting the old table? Only you. If you habe a DBA you might roll it on the DBA ;).
This kind of maintenance would typically be done using an uninstall script / wizard, upgrade script / wizard, or during the first startup of the application in its new version.
On a side note, typically different applications use different databases, or different schemas for the least, thus reducing inter-dependencies.
To clean Quartz Scheduler internal data one needs more SQL:
delete from QRTZ_CRON_TRIGGERS;
delete from QRTZ_SIMPLE_TRIGGERS;
delete from QRTZ_TRIGGERS;
delete from QRTZ_JOB_DETAILS;
delete from QRTZ_FIRED_TRIGGERS;
delete from QRTZ_LOCKS;
delete from QRTZ_SCHEDULER_STATE;