Am I being overcharged by Azure Cosmo DB for 45MB database? - mongodb

We use Cosmo DB as "MongoDB" , We have a database that is only 45MB in size, less than 10,000 documents across all collections.
We run light queries and writes each day, less than 3000 requests/day, also we run “MongoDB Dump” each night to dump the entire database to local server for backup, as said, the downloaded file is only around 45 MB, so I presume it is not too big.
In Feb 2018, we received a bill which is around £3,500 which is surprisingly ridiculous. it looks like we were being charged by number of requests which we knew but for whatever reason, for a 45MB database, we would not use that much!
I've also included 2 images that shows the usage in the last 7 days. From the metrics, it shows lots of requests made by "Others" which is still unknown; it shows very light in reads/writes.
Am I being overcharged by Azure?

The pricing of Azure Cosmos DB is based on the provisioned RUs in your collections.
For Mongo accounts, the "Other" operations are any operation different from Insert/Update/Delete/Query/Count.
To see the details, please go to the Monitor service and select the Metrics (preview).
Then you will need to select your database account, then "Mongo Requests" as the metric, and then finally add a group by "CommandName":
You should be able to see the individual commands there.

Related

Why is my mongo collection being wiped on azure ubuntu instance?

I'm using azure ubuntu instance to store some data every minute in a mongo database. I noticed that the data is being wiped approximately once a day. I'm wondering why my data is being wiped?
I have a log every minute that shows a count of the db. Here are two consecutive minutes that show all records are deleted
**************************************
update at utc: 2022-08-06 10:19:02.393351 local: 2022-08-06 20:19:02.393366
count after insert = 1745
**************************************
update at utc: 2022-08-06 10:20:01.643487 local: 2022-08-06 20:20:01.643544
count after insert = 1
**************************************
You can see the data is wiped as count after insert goes from 1745 to 1. My question is why is my data being wiped?
Short Answer
Data was being deleted in a ransom attack. I wasn't using a mongo password as originally I was only testing mongo locally. Then when I set the bindIp to 0.0.0.0 for remote access, it meant anyone can access if they guess the host (this is pretty dumb of me).
Always secure the server with a password especially if your bindIp is 0.0.0.0. For instructions see https://www.mongodb.com/features/mongodb-authentication
More Detail
To check if you have been ransom attacked, look for a ransom note. An extra database may appear see show dbs in my case the new db with ransom note was called "READ__ME_TO_RECOVER_YOUR_DATA"
All your data is a backed up. You must pay 0.05 BTC to 1Kz6v4B5CawcnL8jrUvHsvzQv5Yq4fbsSv 48 hours for recover it. After 48 hours expiration we will leaked and exposed all your data. In case of refusal to pay, we will contact the General Data Protection Regulation, GDPR and notify them that you store user data in an open form and is not safe. Under the rules of the law, you face a heavy fine or arrest and your base dump will be dropped from our server! You can buy bitcoin here, does not take much time to buy https://localbitcoins.com or https://buy.moonpay.io/ After paying write to me in the mail with your DB IP: rambler+1c6l#onionmail.org and/or mariadb#mailnesia.com and you will receive a link to download your database dump.
Another way to check for suspicious activity is in Mongodb service logs in /var/log/mongodb/mongod.log. For other systems the filename might be mongodb.log. For me there are a series of commands around the attack time in the log, the first of which reads:
{"t":{"$date":"2022-08-07T09:54:37.779+00:00"},"s":"I", "c":"COMMAND", "id":20337, "ctx":"conn30393","msg":"dropDatabase - starting","attr":
{"db":"READ__ME_TO_RECOVER_YOUR_DATA"}}
the command drops the database or starts dropping the db. As suspected there are no commands to read any data which means the attacker isn't backing up as they claim. Unfortunately someone actually payed this scammer earlier this month. https://www.blockchain.com/btc/tx/65d035ca4db759a73bd9cb68610e04742ffe0e0b71ecdf88f54c7e464ee80a51

streaming PostgreSQL tables into Google BigQuery

I would like to automatically stream data from an external PostgreSQL database into a Google Cloud Platform BigQuery database in my GCP account. So far, I have seen that one can query external databases (MySQL or PostgreSQL) with the EXTERNAL_QUERY() function, e.g.:
https://cloud.google.com/bigquery/docs/cloud-sql-federated-queries
But for that to work, the database has to be in GCP Cloud SQL. I tried to see what options are there for streaming from the external PostgreSQL into a Cloud SQL PostgreSQL database, but I could only find information about replicating it in a one time copy, not streaming:
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external
The reason why I want this streaming into BigQuery is that I am using Google Data Studio to create reports from the external PostgreSQL, which works great, but GDS can only accept SQL query parameters if it comes from a Google BigQuery database. E.g. if we have a table with 1M entries, and we want a Google Data Studio parameter to be added by the user, this will turn into a:
SELECT * from table WHERE id=#parameter;
which means that the query will be faster, and won't hit the 100K records limit in Google Data Studio.
What's the best way of creating a connection between an external PostgreSQL (read-only access) and Google BigQuery so that when querying via BigQuery, one gets the same live results as querying the external PostgreSQL?
Perhaps you missed the options stated on the google cloud user guide?
https://cloud.google.com/sql/docs/mysql/replication/replication-from-external#setup-replication
Notice in this section, it says:
"When you set up your replication settings, you can also decide whether the Cloud SQL replica should stay in-sync with the source database server after the initial import is complete. A replica that should stay in-sync is online. A replica that is only updated once, is offline."
I suspect online mode is what you are looking for.
What you are looking for will require some architecture design based on your needs and some coding. There isn't a feature to automatically sync your PostgreSQL database with BigQuery (apart from the EXTERNAL_QUERY() functionality that has some limitations - 1 connection per db - performance - total of connections - etc).
In case you are not looking for the data in real time, what you can do is with Airflow for instance, have a DAG to connect to all your DBs once per day (using KubernetesPodOperator for instance), extract the data (from past day) and loading it into BQ. A typical ETL process, but in this case more EL(T). You can run this process more often if you cannot wait one day for the previous day of data.
On the other hand, if streaming is what you are looking for, then I can think on a Dataflow Job. I guess you can connect using a JDBC connector.
In addition, depending on how you have your pipeline structure, it might be easier to implement (but harder to maintain) if at the same moment you write to your PostgreSQL DB, you also stream your data into BigQuery.
Not sure if you have tried this already, but instead of adding a parameter, if you add a dropdown filter based on a dimension, Data Studio will push that down to the underlying Postgres db in this form:
SELECT * from table WHERE id=$filter_value;
This should achieve the same results you want without going through BigQuery.

CouchDB PouchDB Database Design

I come from a RDBMS and MongoDB background and can't get my head around the flat Couchbase database.
I am working on an educational app, where students (around 10000) will access "their individual study material " and solve it. Since the application works offline too, I need to keep Couchbase as the remote database and PouchDB as local in mobile and keep them in sync.
Use Cases:
If a question is added on the remote server, it should get synced locally to PouchDB.
If a student marks any question "important" or "doubt", it should get synced to the remote Couchbase database in the student's private space.
The schema which I can think after researching from SO is that I maintain an individual database for each student (1 bucket for 1 student) so that the syncing can be done very quickly.
The other model can be 1 bucket containing all the students as different documents. In this case, every question will be replicated as a sub-document for every student's document.
What can be the optimal database design at the Couchbase Server. Shall I go for the first or second approach or some other approach as per your suggestions?

MongoDb mongostat monoitoring for a particular database

My MongoDB server has different Databases.
I need to get the stats for a particular MongoDB database.
For instance, i need something like this :
mongostat -h ip address database name
Can anyone let me know if this is possible ?
Thanks
The mongostat utility reports statistics for the whole instance (all databases) and does not get anymore specific than that (besides reporting the database with the highest lock since 2.2)
The mongotop utility will give you per-database (and collection) reporting but only for a limited amount of information (time spent, locks)
Combining the two will give you a decent idea of what your busiest collections/databases may be.
Beyond that, MMS will give you reporting on a per-database basis as long as the database specific stats are enabled.

Heroku PG: Recover Write access revoked

I have had a Write access revoked in my heroku dev plan as I had too many rows in my db.
Here is the result I have:
$ heroku pg:info
=== HEROKU_POSTGRESQL_WHITE_URL (DATABASE_URL)
Plan: Dev
Status: available
Connections: 1
PG Version: 9.2.7
Created: 2013-06-21 13:24 UTC
Data Size: 12.0 MB
Tables: 48
Rows: 10564/10000 (Write access revoked) - refreshing
Fork/Follow: Unsupported
Rollback: Unsupported
Region: Europe
Since then, I deleted half of the rows, as they were created by a script that degenerated, and I am in my development environment. I checked in pgadmin and in the rails console, and it appears that the rows are actually deleted.
How can I recover write access to my database? I don't want to upgrade as I don't need it normally. Been waiting for 2 hours already nothing changed.
I read heroku pg:info doesn't update really fast, but what can I do then?
Thanks for support
For the Starter tier (now called Hobby tier), the statistics on row count, table count, and database size are calculated and updated periodically by background workers. They can lag by a few minutes, but typically not more than 5 or 10.
After you've cleared out data to get under your row limit, it may take 5-10 minutes for write access to be restored naturally. You can usually speed up this process by running pg:info, which should cause a refresh to happen (thus you see refreshing above).
2 hours is not expected. If you're still locked out and you're sure you're under the row count, please open a ticket at help.heroku.com
According to Heroku documentation: https://devcenter.heroku.com/articles/heroku-postgres-plans#row-limit-enforcement
When you are over the hobby tier row limits and try to insert, you will see a Postgres error:
permission denied for relation <table name>
The row limits of the hobby tier database plans are enforced with the following mechanism:
When a hobby-dev database hits 7,000 rows, or a hobby-basic database hits 7 million rows , the owner receives a warning email stating they are nearing their row limits.
When the database exceeds its row capacity, the owner will receive an additional notification. At this point, the database will receive a 7 day grace period to either reduce the number of records, or migrate to another plan.
If the number of rows still exceeds the plan capacity after 7 days, INSERT privileges will be revoked on the database. Data can still be read, updated or deleted from database. This ensures that users still have the ability to bring their database into compliance, and retain access to their data.
Once the number of rows is again in compliance with the plan limit, INSERT privileges are automatically restored to the database. Note that the database sizes are checked asynchronously, so it may take a few minutes for the privileges to be restored.
So, you most likely have to upgrade your db plan
I faced the same restrictions but because of being in excess of heroku's GB data size limit (rather than row limit), but the outcome was the same ("Write access revoked; Database deletion imminent").
Deleting rows alone was not enough to free up disk space - After deleting the rows I had to also run the postgres VACUUM FULL command in order to see a reduction in disk space figure returned with heroku pg:info.
Here's a full explanation of the steps to solve.