Deploying Sphinx in Production - sphinx

Once I have rotated my index, do I need to keep my original sphinx.cfg file on my production server? In other words is it used at all during sphinxql queries? Can I for security purposes (proprietary data) remove that file until I am ready to rotate again?

Related

MongoDB merging db

Is there a way to merge two mongodb databases?
In a way all records and files from DB2 should be merged to DB1.
I have a Java based web application with several APIs to download file content from the MongoDB. So I'm thinking using bash curl download the file, read the records properties then re-upload (merge) to the destination DB1.
This however will have an issue since the same Mongo _id ObjectID("xxxx") from DB2 cannot be transfer to DB1. MongoDB will automatically generate and assign ObjectID("xxxx") value based on what I understand.
Yes, use Mongodump and Mongorestore.
the chance for a duplicate document id (assuming its not the same document) is extremely low.
and in that case mongo will let you know insertion has failed and you could choose to deal with it however you see fit.
You could also use the write concern flag with the restore to decide how to deal with it while uploading.

When testing POST (create mongo entries), how to delete entries in DB w/ Jmeter after testing, if you don't have DELETE endpoints?

I'm sure I can write an easy script that simply drops the entire collection from the database but that seems very clumsy as a long term solution.
Currently, we don't have delete endpoints that actually DELETE, we have PUT endpoints that mark the entry as "DONT SHOW/REMOVED" and another "undelete endpoint" that restores the viewing since we technically don't want to delete any data in our implementation of this medical database, for liability purposes.
Does Jmeter have a way where I can make it talk to Mongo and delete? I know there is a deprecated way to talk to mongo via Jmeter but not sure about any modern solutions.
Since I can't add unused code into the repo, does this mean the only solution is for me to make a "extra endpoint" outside of the repo that Jmeter can access to delete each entry?
Seems like a viable solution just not sure if that's the only way to go about it and if I'm missing something.
MongoDB Test Elements were deprecated due to low interest as keeping the MongoDB driver which is being shipped with JMeter up-to-date would require extra effort and the number of users of the MongoDB Test Elements was not that high.
Mailing List Message
Associated JMeter issue
However given you don't test MongoDB per se and plan to use JMeter MongoDB elements only for setup/teardown actions I believe you can go ahead.
You can get MongoDB test elements back by adding the next line to user.properties file:
not_in_menu
This will "unhide" MongoDB Source Config and MongoDB Script elements which you will be able to use for cleaning up the DB. See How to Load Test MongoDB with JMeter for more information, sample queries, tips and tricks.

Best way to backup and restore data in PostgreSQL for testing

I'm trying to migrate our database engine from MsSql to PostgreSQL. In our automated test, we restore the database back to "clean" state at the start of every test. We do this by comparing the "diff" between the working copy of the database with the clean copy (table by table). Then copying over any records that have changed. Or deleting any records that have been added. So far this strategy seems to be the best way to go about for us because per test, not a lot of data is changed, and the size of the database is not very big.
Now I'm looking for a way to essentially do the same thing but with PostgreSQL. I'm considering doing the exact same thing with PostgreSQL. But before doing so, I was wondering if anyone else has done something similar and what method you used to restore data in your automated tests.
On a side note - I considered using MsSql's snapshot or backup/restore strategy. The main problem with these methods is that I have to re-establish the db connection from the app after every test, which is not possible at the moment.
If you're okay with some extra storage, and if you (like me) are particularly not interested in re-inventing the wheel in terms of checking for diffs via your own code, you should try creating a new DB (per run) via templates feature of createdb command (or CREATE DATABASE statement) in PostgreSQL.
So for e.g.
(from bash) createdb todayDB -T snapshotDB
or
(from psql) CREATE DATABASE todayDB TEMPLATE snaptshotDB;
Pros:
In theory, always exact same DB by design (no custom logic)
Replication is a file-transfer (not DB restore). So far less time taken (i.e. doesn't run SQL again, doesn't recreate indexes / restore tables etc.)
Cons:
Takes 2x the disk space (although template could be on a low performance NFS etc)
For my specific situation. I decided to go back to the original solution. Which is to compare the "working" copy of the database with "clean" copy of the database.
There are 3 types of changes.
For INSERT records - find max(id) from clean table and delete any record on working table that has higher ID
For UPDATE or DELETE records - find all records in clean table EXCEPT records found in working table. Then UPSERT those records into working table.

Quickest/best way to copy a portion of a large mongo database to another server?

I have a dataset of 100m tweets stored in Mongo, unoptimized and unindexed.
I need to copy all tweets from the last month onto another server, what is the best way to do this?
My idea was to use a Ruby script to extract and copy the relevant tweets to a new database on the server, then run the mongo copyDatabase command to copy it over. Its taking horrendously long though, any other way to do it?
require 'mongo_mapper'
MongoMapper.database = 'twitter'
require './models'
tweets = TwitterTweet.where(:created_at => {"$gt" => 1.month.ago}).all; # about 15 million
MongoMapper.database = 'monthly'
# copy the tweets over to the new db
tweets.each do |tweet|
tweet.save!
end;
If you need your data on the several servers you just should use mongodb replication features.
If you just want to backup your data, then the quickest way will by just copy db files.
A few ideas:
Adding multiple clients/threads to do the processing/saving (maybe each one work on a single day's tweets for example just to make it simple). Continue to add clients until the server is at capacity.
Copy the entire database to the new server and remove the old data, then compact it (and index it).
Consider disabling journaling (if that's safe for your needs), or tuning it to write less frequently (less durability)
Ensure any logging, tracing, etc. is disabled
Make sure the server is adaquately sized to handle load
Or, just take a long holiday while it completes. :)
Just copy the database files to a new host, start mongod and delete the documents / drop databases / collections you don't need. That's the quickest way.

How to handle syncing a user's db with a master db on a server?

So I'm planning an app that will involve having a master db on a server, lets say 3,000 CDs, with the columns Title, Artist, and Release Date.
1)When a user adds a CD to their collection, it will add it to the apps local SQLite DB. But lets say I spelled a CD title wrong, so I make an update to it. When the user goes to sync, how should I go about handling an updated row? Should I have a column 'IsUpdated' that is just a numeric value that increase by one every time I update that row? That way when the app sees IsUpdated on the server is larger than the local IsUpdated for that particular item, it will now to replace the contents. Does that make sense? Is it even practical? What other option would there be?
2) How would I do about handling the addition of brand new columns? Like adding a Barcode or Price? Do I just push an update for the app that adds the new columns locally, then do the same on the server, and let the rest take its run? Which would also trickle to number 1 with the syncing issue.
First you have to give more detail than that. Is the entire 3000 master list also replicated down to the remote db?
Sounds like it.
Ok so if that the case, this isn't a DB design issue so much as it is replication.
It's a bad idea to update every row in a table, especially one that makes the row longer. You'll be better off just dropping the table and recreating. <--- that's how it works in RDBMS on servers, no idea if that concept changes on a client db. And now we get into more iPhone questions of replication than simple db replication. Would it be better to just republish the app? Is the user data segregated from the server data. Can DDL be done on the local/remote tables after published?
Instead of searching the entire list for changes as you outline in #1. I would keep a dated delta table. The local app would store a last_updated_Datetime, any records in the delta table after that datetime would need to be brought down. Once downloaded the local system can determine how to apply them. Again this is inappropriate for mass changes.