Mongodb collection update patterns - mongodb

I have a collection in db(e.g. merchants) which gets updated regularly in prod.
What we have is a db population script that we run every time.
This removes configuration collections and inserts them again based on data in script e.g. includes the update to a merchant config.
db.merchants.remove({});
db.merchants.insert(themerchant);
Is there a better pattern/procedure that people use to do this?
I cant seem to find guidance on how people are doing this in production. Links to methods people use would be great.
E.g. In SQL Server we create patches

Related

SchemaSpy without any direct database connection

I want to use SchemaSpy, but I my database is used heavily 24/7 and the DBA won't give me access, even readonly. However, i can give the DBA some commands and he can run them and give me the results.
Is it possible for SchemaSpy to run offline mode? In other words, Can I give it a dump of all the "CREATE TABLE, CREATE INDEX" and a list of all the sizes of the tables, and then it can generate the report?
Ok, The best thing about schemaspy is that it automatically runs and collects all the objects and in the case of the tables performed a count.
In your specific case you can use a work around as follows.
Ask your DBA for a dump or even the empty bank creation script, just the structures. And direct schemaspy to that bank that simulates your production.
By the way I have created a docker image that uses schamespy to document all the bases of a server.
https://github.com/krismorte/database-diagrams

When testing POST (create mongo entries), how to delete entries in DB w/ Jmeter after testing, if you don't have DELETE endpoints?

I'm sure I can write an easy script that simply drops the entire collection from the database but that seems very clumsy as a long term solution.
Currently, we don't have delete endpoints that actually DELETE, we have PUT endpoints that mark the entry as "DONT SHOW/REMOVED" and another "undelete endpoint" that restores the viewing since we technically don't want to delete any data in our implementation of this medical database, for liability purposes.
Does Jmeter have a way where I can make it talk to Mongo and delete? I know there is a deprecated way to talk to mongo via Jmeter but not sure about any modern solutions.
Since I can't add unused code into the repo, does this mean the only solution is for me to make a "extra endpoint" outside of the repo that Jmeter can access to delete each entry?
Seems like a viable solution just not sure if that's the only way to go about it and if I'm missing something.
MongoDB Test Elements were deprecated due to low interest as keeping the MongoDB driver which is being shipped with JMeter up-to-date would require extra effort and the number of users of the MongoDB Test Elements was not that high.
Mailing List Message
Associated JMeter issue
However given you don't test MongoDB per se and plan to use JMeter MongoDB elements only for setup/teardown actions I believe you can go ahead.
You can get MongoDB test elements back by adding the next line to user.properties file:
not_in_menu
This will "unhide" MongoDB Source Config and MongoDB Script elements which you will be able to use for cleaning up the DB. See How to Load Test MongoDB with JMeter for more information, sample queries, tips and tricks.

Postgres Multi-tenant administration/maintenance

We have a SaaS application where each tenant has its own database in Postgres. How would I apply a patch to all the databses? For example if I want to add a table or add a column to a table, I have to either write a program that loops through all databases and execute a SQL against them or using pgadmin, go through them one by one.
Is there smarter and/or faster way?
Any help is greatly appreciated.
Yes, there's a smarter way.
Don't create a new database for each tenant. If everything is in one database then you only need to alter one database.
Pick one database, alter each table to have the column TENANT and add this to the primary key. Then insert into this database every record for all tenants and drop the other databases (obviously considerably more work than this as your application will need to be changed).
The differences with your approach are extensively discussed elsewhere:
What problems will I get creating a database per customer?
What are the advantages of using a single database for EACH client?
Multiple schemas versus enormous tables
Practicality of multiple databases per client vs one database
Multi-tenancy - single database vs multiple database
If you don't put everything in one database then I'm afraid you have to alter them all individually, and doing it programatically would be simplest.
At a higher level, all multi-tenant applications follow one of three approaches:
One tenant's data lives in one database,
One tenant's data lives in one schema, or
Add a tenant_id / account_id column to your tables (shared schema).
I usually find that developers use the following criteria when they evaluate these different approaches.
Isolation: Since you can put each tenant into its own database in one hand, and have tenants share the same table on the other, this becomes the most apparent dimension. If you provide your users raw SQL access or you're in a regulated industry such as healthcare, you may need strict guarantees from your database. That said, PostgreSQL 9.5 comes with row level security policies that makes this less of a concern for most applications.
Extensibility: If your tenants are sharing the same schema (approach #3), and your tenants have fields that varies between them, then you need to think about how to merge these fields.
This article on multi-tenant databases has a great summary of different approaches. For example, you can add a dozen columns, call them C1, C2, and so forth, and have your application infer the actual data in this column based on the tenant_id. PostgresQL 9.4 comes with JSONB support and natively allows you to use semi-structured fields to express variations between different tenants' data.
Scaling: Another criteria is how easily your database would scale-out. If you create a tenant per database or schema (#1 or #2 above), your application can make use of existing Ruby Gems or [Django packages][1] to simplify app integration. That said, you'll need to manually manage your tenants' data and the machines they live on. Similarly, you'll need to build your own sharding logic to propagate foreign key constraints and ALTER TABLE commands.
With approach #3, you can use existing open source scaling solutions, such as Citus. For example, this blog post describes how to easily shard a multi-tenant app with Postgres.
it's time for me to give back to the community :) So after 4 years, our multi-tenant platform is in production and I would like to share the following observations/experiences with all of you.
We used a database per each tenant. This has given us extreme flexibility as the size of the databases in the backups are not huge and hence we can easily import them into our staging environment for customers issues.
We use Liquibase for database development and upgrades. This has been a tremendous help to us, allowing us to package the entire build into a simple war file. All changes are easily versioned and managed very efficiently. There is a bit of learning curve here an there but nothing substantial. 2-5 days can significantly save you time.
Given that we use Spring/JPA/Hibernate, we use a technique called Dynamic Data Source Routing. So when a user logs-in, we find the related datasource with a lookup and connect them to the session to the right database. That's also when the Liquibase scripts get applied for updates.
This is, for now, I will come back with more later on.
Well, there are problems with one database for all tenants in our case for sure.
The backup file gets huge and becomes almost not practical hard to manage
For troubleshooting, we need to restore customer's data in our dev env, we just use that customer's backup file and usually the file is not as big as if we were to use one database for all customers.
Again, Liquibase has been key in allowing to manage updates across all the tenants seamlessly and without any issues. Without Liquibase, I can see lots of complications with this approach. So Liquibase, Liquibase and more Liquibase.
I also suspect that we would need a more powerful hardware to manage a huge database with large joins across millions of records vs much lighter database with much smaller queries.
In case of problems, the service doesn't go down for everyone and there will be limited to one or few tenants.
In general, for our purposes, this has been a great architectural decision and we are benefiting from it every day. One time we had one customer that didn't have their archiving active and their database size grew to over 3 GB. With offshore teams and slower internet as well as storage/bandwidth prices, one can see how things may become complicated very quickly.
Hope this helps someone.
--Rex

Existing Postgres Database vs Solr

We have an app that uses postgres database, that has about 50 tables. Each table contains about 3 Million records (on average). The tables get updated with new data every now and than. Now, we want to implement search feature in our app. The search needs to be performed on one table at a time (no joins needed).
I've read about postgres full text support and that looks promising. But it seems that Solr is Super fast in comparison to it. Can I use my existing postgres database with Solr? If tables get updated would I need to re-index everything again?
It is definitely worth giving Solr a try. We moved many MySQL queries involving JOINs on multiple tables with sorting on different fields to Solr. We are very happy with Solr's search speed, sort speed, faceting capabilities and highly configurable text analysis/tokenization options.
If tables get updated would I need to re-index everything again?
No, you can run delta imports to only re-index your new and updated documents. See https://wiki.apache.org/solr/DataImportHandler.
Get started with https://lucene.apache.org/solr/4_1_0/tutorial.html and all the links in there.
Since nobody has leapt in, I'll answer.
I'm afraid it all depends. It depends on (at least)
how big the text is in each "document"
how flexible you want your searching to be
how much integration you need between database and text-search
how fast is fast enough
how much experience you have with both
When I've had a database that needs some text searching, I've just used PG's built-in options. If I didn't have superuser access to the db, or was already running a big Java setup then Solr might well have appealed.

How do I merge structure changes from one Firebird's database to another? (not the data)

The problem is like this: my company has a service that cannot stop running for long periods of time and I was working on some modifications in the database structure used by this service.
Now that all my modifications are ready and well tested in a test bench environment, I want to export them to the running system. I could do this manually with IBExpert or FlameRobin, but I wanted to know if there is a more automated method for doing this (I feel dumb by spending a whole day creating tables, attributes, and so on one by one).
Is there?
You mention IBExpert - It has the Database Comparer Tool which generates desired DDL to merge databases structure.
And as you know you can use IBEBlock to fully automate that process.
PS. Or deploy your own app using IBEScript.dll - which lets you use all functionalities of the IBEBlock scripting language
Please read: http://ibexpert.net/ibe/index.php?n=Main.IBEScriptDll
Check out the database compare feature of Database Workbench (Windows client). It can compare whatever database objects you select and generate DDL to modify your destination database. Unfortunately you will need the Pro edition, but there is a 30 day trial.