Why is my mongo collection being wiped on azure ubuntu instance? - mongodb

I'm using azure ubuntu instance to store some data every minute in a mongo database. I noticed that the data is being wiped approximately once a day. I'm wondering why my data is being wiped?
I have a log every minute that shows a count of the db. Here are two consecutive minutes that show all records are deleted
**************************************
update at utc: 2022-08-06 10:19:02.393351 local: 2022-08-06 20:19:02.393366
count after insert = 1745
**************************************
update at utc: 2022-08-06 10:20:01.643487 local: 2022-08-06 20:20:01.643544
count after insert = 1
**************************************
You can see the data is wiped as count after insert goes from 1745 to 1. My question is why is my data being wiped?

Short Answer
Data was being deleted in a ransom attack. I wasn't using a mongo password as originally I was only testing mongo locally. Then when I set the bindIp to 0.0.0.0 for remote access, it meant anyone can access if they guess the host (this is pretty dumb of me).
Always secure the server with a password especially if your bindIp is 0.0.0.0. For instructions see https://www.mongodb.com/features/mongodb-authentication
More Detail
To check if you have been ransom attacked, look for a ransom note. An extra database may appear see show dbs in my case the new db with ransom note was called "READ__ME_TO_RECOVER_YOUR_DATA"
All your data is a backed up. You must pay 0.05 BTC to 1Kz6v4B5CawcnL8jrUvHsvzQv5Yq4fbsSv 48 hours for recover it. After 48 hours expiration we will leaked and exposed all your data. In case of refusal to pay, we will contact the General Data Protection Regulation, GDPR and notify them that you store user data in an open form and is not safe. Under the rules of the law, you face a heavy fine or arrest and your base dump will be dropped from our server! You can buy bitcoin here, does not take much time to buy https://localbitcoins.com or https://buy.moonpay.io/ After paying write to me in the mail with your DB IP: rambler+1c6l#onionmail.org and/or mariadb#mailnesia.com and you will receive a link to download your database dump.
Another way to check for suspicious activity is in Mongodb service logs in /var/log/mongodb/mongod.log. For other systems the filename might be mongodb.log. For me there are a series of commands around the attack time in the log, the first of which reads:
{"t":{"$date":"2022-08-07T09:54:37.779+00:00"},"s":"I", "c":"COMMAND", "id":20337, "ctx":"conn30393","msg":"dropDatabase - starting","attr":
{"db":"READ__ME_TO_RECOVER_YOUR_DATA"}}
the command drops the database or starts dropping the db. As suspected there are no commands to read any data which means the attacker isn't backing up as they claim. Unfortunately someone actually payed this scammer earlier this month. https://www.blockchain.com/btc/tx/65d035ca4db759a73bd9cb68610e04742ffe0e0b71ecdf88f54c7e464ee80a51

Related

MongoDB Backups: Expire Data from Collections by Setting TTL

I have read the MongoDB's official guide on Expire Data from Collections by Setting TTL. I have set everything up and everything is running like clockwork.
One of the reasons why I have enabled the TTL is because one of the product's requirements is to auto-delete a specific collection. Well, the TLL handles it quite well. However, I have no idea if the data expiration will also persist on the MongoDB backups. The data is also supposed to be automatically deleted from the backups. In case the backups get leaked or restored, the expired data shouldn't be there.
Backup contains the data that was present in the database at the time of the backup.
Once a backup is made, it's just a bunch of data that sits somewhere without being touched. The documents that have been deleted since the backup was taken are still in the backup (arguably this is the point of the backup to begin with).
If you want to expire data from backups, the normal solution is to delete backups older than a certain age.
As mentioned by #D.SM data is not deleted from backup. One solution could be to encrypt your data, e.g. with Client-Side Field Level Encryption
For every day use a new encryption key for your data. When your data should expire, drop according encryption key from your password storage. With this your data becomes unusable, even if somebody restores the data from an old backup.

PCI Compliance with Native Postgresql

We have PostgreSQL database no 3rd party software a Linux ad min and a SQL dba with little PostgreSQL experience.
We need to set up audit\access logging of all transactions on the CC tables. We enabled logging, but we are concerned about enabling everything to log. We want to restrict it to specified tables. I am not finding a resource that I under stand to accomplish this.
a few blogs have mentioned table triggers and logfiles
I found another that discusses functions. I am just not sure how t proceed on this. The following is the PCI information I am working off of:
(Done) Install pg_stat_statements extension to monitor all queries (SELECT, INSERT, UPDATE, DELETE)
Setup monitor to find out suspicious access on PAN holding table
Enable connection/disconnection logging
Enable Web Server access logs
Monitor Postgres logs for unsuccessful login attempts
Automated log analysis & Access Monitoring using Alerts
Keep archive audit and log history for at least one year and for last 3
months ready available for analysis
Update
also need to apply password policy to postgrsql db users.
90 day expirations (there is a place to set a date but not an
interval)
Lock out user 6 failed attempts Locked out for 30 minutes
or until an administrator enables the userID.
Force re-authenticate when idle for more than 15 minutes
Passwords/phrases must meet the
following: Require a minimum length of at least seven characters.
Contain both numeric and alphabetic characters.
Cannot be same as last 4 passwords/passphrases used
2) There is no direct way to log access to tables. The extension pg_audit claims that it can do that. I have never used it though.
3) can easily be done using log_connections and log_disconnections
4) has nothing to do with Postgres
5) can be done once connection logging has been done by monitoring the logfile
6) no idea what that should mean
7) that is independent of the Postgres setup. You just need to make sure the Postgres logfiles are archived properly.

Am I being overcharged by Azure Cosmo DB for 45MB database?

We use Cosmo DB as "MongoDB" , We have a database that is only 45MB in size, less than 10,000 documents across all collections.
We run light queries and writes each day, less than 3000 requests/day, also we run “MongoDB Dump” each night to dump the entire database to local server for backup, as said, the downloaded file is only around 45 MB, so I presume it is not too big.
In Feb 2018, we received a bill which is around £3,500 which is surprisingly ridiculous. it looks like we were being charged by number of requests which we knew but for whatever reason, for a 45MB database, we would not use that much!
I've also included 2 images that shows the usage in the last 7 days. From the metrics, it shows lots of requests made by "Others" which is still unknown; it shows very light in reads/writes.
Am I being overcharged by Azure?
The pricing of Azure Cosmos DB is based on the provisioned RUs in your collections.
For Mongo accounts, the "Other" operations are any operation different from Insert/Update/Delete/Query/Count.
To see the details, please go to the Monitor service and select the Metrics (preview).
Then you will need to select your database account, then "Mongo Requests" as the metric, and then finally add a group by "CommandName":
You should be able to see the individual commands there.

Application event logging for statistics

I have app in production and working. It is hosted on heroku and uses Postgres 9.3 in the cloud. There are 2 databases: master and (read-only follower) slave. There are tables like Users, Likes, Followings, Subscriptions and so on. We need to store complete log about events like userCreated, userDeleted, userLikedSomething, userUnlikedSomething, userFollowedSomeone, userUnfollowedSomeone and so on. Later on we have to prepare statistic reports/charts about current and historical data. The main proble is that when user is deleted it is just removed from db so we can't retrieve users that were deleted from db because they are not stored in db anymore. Same applies to likes/unlikes follows/unfollows and so on. There are few things I don't know how to handle properly:
If we will store events in same database with foreign keys to user profiles then historical data will change because each event will be "linked" to current user profile which will change in time.
If we will store events in separate postgres database (db just for logs to offload the main database) then to join the events with actual user profiles we would have to use cross-db joins (dblink) which might be slow I guess (I have never used this feature before). Anyway this wont solve the problem from point 1.
I thought about using different type of database for storing logs - maybe MongoDb - as I remember mongoDb is more "write-heavy" than postgres (which is more "read-heavy"?) so it might be more suitable for storing logs/events. However then I would have to store user profiles in two databases (and even user profile per each event to solve point 1).
I know this is very general question but maybe there is some kind of standard approach or special database type for storing such data?

is it possible to fork a mysqldump of data?

I am restoring a mysql database with perl on a remote server with about 30 million records. It's taking > 2 days & looking at my network connections I am not fully utilizing my uplink bandwidth. I will need to do this at least 1x per week. Is there a way to fork a mysqldump (I'm using perl) so that I can take full advantage of my bandwidth (I don't mind if I'm choked off for a bit...I just need to get this done faster).
Can't you upload the whole dump to the remote server and start the restore there?
A restore of a mysqldump is just the execution of a long series of commands that would restore your database from scratch. If the execution path for that is; 1) send command 2) remote system executes command 3) remote system replies that the command is complete 4) send next command, then you are spending most of your time waiting on network latency.
I do know that most SQL hosts will allow you to upload a dump file specifically to avoid the kinds of restore time that you're talking about. The company that takes my money each month even has a web-based form that you can use to restore a database from a file that has been uploaded via sftp. Poke around your hosting service's documentation. They should have something similar. If nothing else (and you're comfortable on the command line) you can upload it directly to your account and do it from a shell there.
mk-parallel-dump and mk-parallel-restore are designed to do what you want, but in my testing mk-parallel-dump was actually slower than plain old mysqldump. Your mileage may vary.
(I would guess the biggest factor would be the number of spindles your data files reside on, which in my case, 1, was not especially conducive to parallelization.)
First caveat: mk-parallel-* writes a bunch of files, and figuring out when it's safe to start sending them (and when you're done receiving them) may be a little tricky. I believe that's left as an exercise for the reader, sorry.
Second caveat: mk-parallel-dump is specifically advertised as not being for backups. Because "At the time of this release there is a bug that prevents --lock-tables from working correctly," it's really only useful for databases that you know will not change, e.g., a slave that you can STOP SLAVE on with no repercussions, and then START SLAVE once mk-parallel-dump is done.
I think a better solution than parallelizing a dump may be this:
If you're doing your mysqldump on a weekly basis, you can just do it once (dumping with --single-transaction (which you should be doing anyway) and --master-data=n) and then start a slave that connects over an ssh tunnel to the remote master, so the slave is continually updated. The disadvantage is that if you want to clone a local copy (perhaps to make a backup) you will need enough disk to keep an extra copy around. The advantage is that a week's worth of (query-based) replication log is probably quite a bit smaller than resending the data, and also it arrives gradually so you don't clog your pipe.
How big is your database in total? What kind of tables are you using?
A big risk with backups using mysqldump has to do with table locking, and updates to tables during the backup process.
The mysqldump backup process basically works as follows:
For each table {
Lock table as Read-Only
Dump table to disk
Unlock table
}
The danger is that if you run an INSERT/UPDATE/DELETE query that affects multiple tables while your backup is running, your backup may not capture the results of your query properly. This is a very real risk when your backup takes hours to complete and you're dealing with an active database. Imagine - your code runs a series of queries that update tables A,B, and C. The backup process currently has table B locked.
The update to A will not be captured, as this table was already backed up.
The update to B will not be captured, as the table is currently locked for writing.
The update to C will be captured, because the backup has not reached C yet.
This is an easy way to destroy referential integrity in your database.
Your backup process needs to be atomic, and transactional. If you can't shut down the entire database to writes during the backup process, you're risking disaster.
Also - there must be something wrong here. At a previous company, we were running nightly backups of a 450G Mysql DB (largest table had 150M rows), and it took less than 6 hours for the backup to complete.
Two thoughts:
Do you have a slave database? Run the backup from there - Stop replication (preventing RW risk), run the backup, restart replication.
Are your tables using InnoDB? Consider investing in InnoDBhotbackup, which solves this problem, as the backup process leverages the journaling that is part of the InnoDB storage engine.