Does Google Cloud SQL has an RPO, RTO and ERT guide online? - google-cloud-sql

Azure SQL has provided their DR's capabilities https://learn.microsoft.com/en-us/azure/sql-database/sql-database-business-continuity in term of RPO and RTO benchmark.
I am looking for the same on Google's Cloud SQL but can't find any information on the same. If I enable HA on Cloud SQL, and I need RTO = 1 hour and RPO < 1 hour, is this achievable?

I think that they have High Availability on Cloud SQL, you can check this link [1], and also they have a document on Disaster Recovery Scenarios for Data [2], but I don't find specific numbers, the only thing to be consistent is the SLA.
[1] https://cloud.google.com/sql/docs/mysql/high-availability#HA-configuration
[2] https://cloud.google.com/solutions/dr-scenarios-for-data#managed-database-services-on-gcp

This was published in may 2021:
https://cloud.google.com/architecture/disaster-recovery/product-commitments/may25-2021
In the event of Zone Failure, RTO=15min and RPO=15min. In the event of region failure, RTO=1h and RPO=1h.

Related

Snowflake query slowness

Did anyone observed slowness when querying data from snowflake(select statement) with OS Windows 2016 + pycharm environment.
Getting result quickly with window 10 OS with same environment.
When checked network performance using wireshark, round trip time is more with windows 10 when compared with window 2016.
Other difference is window 10 is on physical on-prem system and windows 2016 is workspace in AWS.
Any one experienced slowness with similar setup as above.
Any suggestion to troubleshoot further.
Do you think the query execution at Snowflake is slow?
Have you compared the query execution time by keeping other conditions (like warehouse size, load on the warehouse) the same?
If the slowness is in data transfer over network, it could be because of AWS region etc, not to mention several other factors that could impact network data transfer speeds.
Snowflake's query execution performance should not be impacted based on your location or OS etc.
Snowflake is running as a SaaS on the cloud. In Ragesh's case I think it is running on AWS. So, from a query performance perspective on snowflake, it ideally should not matter whether you are initiating the connection from a Windows 2010 or 2016. But the network bandwidth may be a factor to consider. Your bandwidth on premise may not be as good as when you are on AWS. To be sure about that. Can you please do this test.
Run your query with Windows 10 - Before running the query, please set a query tag like
ALTER SESSION SET QUERY_TAG ='ONPREMISE';
Run your query with Windows 16 - Before running the query, please set a query tag like
ALTER SESSION SET QUERY_TAG ='ONAWS';
After this, go to Snwoflake console history view and filter based on query_tag and share the query profile details for both the scenarions

AWS database solution for storing non-relational data

Whats the best AWS database for the below requirement
I need to store around 50,000 - 1,00,000 entries in the database.
Each of the entry would have a String as a key and a Json array as the value.
I should be able to retrieve the JSON array using the key.
The size of JSON data is around 20-30KB
I expect around 10,000 - 40,000 reads per hour.
Around 50,000 - 1,00,000 writes/week
I have to consider the cost as well.
Ease of integration/development
I am bit confused between MongoDB, DynamoDB and PostgreSQL. Please share your thoughts on this.
DynamoDB:-
DynamoDB is a fully managed proprietary NoSQL database service that supports key-value and document data structures. For the typical use case that you have described in OP, it would serve the purpose.
DynamoDB can handle more than 10 trillion requests per day and support
peaks of more than 20 million requests per second.
DynamoDB has good AWS SDK for all operations. The read and write capacity units can be configured for the table.
DynamoDB tables using on-demand capacity mode automatically adapt to
your application’s traffic volume. On-demand capacity mode instantly
accommodates up to double the previous peak traffic on a table. For
example, if your application’s traffic pattern varies between 25,000
and 50,000 strongly consistent reads per second where 50,000 reads per
second is the previous traffic peak, on-demand capacity mode instantly
accommodates sustained traffic of up to 100,000 reads per second. If
your application sustains traffic of 100,000 reads per second, that
peak becomes your new previous peak, enabling subsequent traffic to
reach up to 200,000 reads per second.
One point to note is that it doesn't allow to query the table based on non-key attributes. This means if you don't know the hash key of the table, you may need to do full table scan to get the data. However, there is a Secondary Index option which you can explore to get around the problem. You may need to have all the Query Access Patterns of your use case before you design and make informed decision.
MongoDB:-
MongoDB is not a fully managed service on AWS. However, you can setup the database using AWS service such as EC2, VPC, IAM, EBS etc. This requires some AWS cloud experience to setup the database. The other option is to use MongoDB Atlas service.
MongoDB is more flexible in terms of querying. Also, it has a powerful aggregate functions. There are lots of tools available to query the database directly to explore the data like SQL.
In terms of Java API, the Spring MongoDB can be used to perform typical database operation. There are lots of open source frameworks available on various languages for MongoDB (example Mongoose Nodejs) as well.
The MongoDB has support for many programming languages and the APIs are mature as well.
PostgreSQL:-
PostgreSQL is a fully managed database on AWS.
PostgreSQL has become the preferred open source relational database
for many enterprise developers and start-ups, powering leading
geospatial and mobile applications. Amazon RDS makes it easy to set
up, operate, and scale PostgreSQL deployments in the cloud.
I think I don't need to write much about this database and its API. It is very mature database and has good APIs.
Points to consider:-
Query Access Pattern
Easy setup
Database maintenance
API and frameworks
Community support

How to compare the performance of mongoDb with CosmosDb on Azure cloud?

I have a mongoDb instance provisioned on Azure cloud used as IAAS. There is a load balancer behind which there is a shard cluser, each shard has 2 replicas. Each replica is a VM. So I can go inside that VM and check the storage space, RAM etc and check on the hardware details for that VM.
Now, I have cosmosDb provisioned as well which is actually a managed service and I have no control over what it uses behind the hoods. For example, I would not know how much RAM, what storage space etc is used.
So if I have to compare the performance of mongoDb and cosmosDb on azure cloud, I am not sure how to compare apples to apples if I don't have the exact information about the underlying hardware.
Can someone suggest a way I can compare the performance of the two ?
Why not compare on price?
Take the direct Azure charges for your IAAS mongodb and allocate the same budget to purchase an allowance of CosmosDb request units. This would represent a very basic comparison.
Next fine tune your comparison to genuinely reflect some advantages of PaaS CosmosDb.
Assume you could dial down allocated RU by 30% for 10 hours per day.
Enable the new add-on provisioning for request units per minute. 20% cost savings have been cited by Microsoft when this feature is enabled.
Finally, add 10% of the salary of a Database Administrator to your total IAAS cost.

Google Cloud SQL Pricing

I am an avid user of Amazon AWS but I am not sure about the RDS as compared to Google's Cloud SQL. In this site - it is mentioned that Per Use Billing Plan exists.
How is that calculated? It is mentioned 'charged for periods of continuous use, rounded up to the nearest hour'.
How does it go? If there are no visitors to my site there are no charges, right? What if I say I have 100 continuous users for 30 days. Will I still be billed $0.025 per hour (excluding the network usage charges)?
How do I upload my present SQL database to Google Cloud service? Is it the same way as Amazon using Oracle Workbench?
Thank you
Using the per use billing, if your database isn't access for 15 minutes then it is taken offline and you are only charged for data storage ($0.24 per GB per month). Its brought back online the next time it's accessed, which typically takes around a second for a D1 instance. The number of users doesn't affect the charge: you are charged for the database instance, not the user.
More details here
https://developers.google.com/cloud-sql/faq#how_usage_calculated
More information on importing data here:
https://developers.google.com/cloud-sql/docs/import-export
For Google Cloud SQL, I think we need to differentiate the MySQL 1st generation and the 2nd generation. In this FAQ link (answered by Joe Faith), https://developers.google.com/cloud-sql/faq#how_usage_calculated, it is about the 1st generation with activation policy of ON_DEMAND, meaning that you are charged per minute of usage.
However, with MySQL 2nd generation (as answered by Se Song), it will charge you entirely every minute (24 h per day) regardless whether you have active connections or not. The reason is that it uses the instance with activation policy = ALWAYS. You can read more the pricing details in here: https://cloud.google.com/sql/pricing/#2nd-gen-pricing
You can manually stop and restart your database instance, and hence it could be possible to write a script that activates it under particular circumstances, but this is not provided within GCP's features.
Watch the default settings carefully or you risk $350/month fees. Here's my experience: https://medium.com/#the-bumbling-developer/can-you-use-google-cloud-platform-gcp-cheaply-and-safely-86284e04b332

Iphone Development using Data from Wiki using SPARQL

I am looking into developing Iphone app that will use Data from wikipedia. I learned that you can query wiki using sparql end point. Does any one know any websites that can be used to query such data. I am trying to use DBPedia but sometimes i get timeout errors. I am looking for something more stable. Do you think that it would be very slow if i am getting a large result set?
Thank you for all the responses.
Another sparql endpoint that can be used to query DBpedia dataset is lod.openlinksw.com. It is backed with more servers, but has a lag in updating dataset. Anyways, you need to construct queries that retrieve small result sets to achieve better response time.
I presume you are querying against the official DBpedia SPARQL endpoint hosted in Virtuoso ? If so being a community driven and hosted endpoint there are restrictions on use, particularly in terms of the size of the result sets that can be returned which has a restricted 2000 rows to protect it from abuse. Should you want to obtain larger result sets then it is recommended the SPARQL LIMIT and OFFSET clauses be used to retrieve results in chunks of 2000.
If you really want a dedicated DBpedia SPARQL endpoint with no restrictions and under your control, OpenLink Software provide an option to instantiate and host your own DBpedia EC2 AMI in the cloud which is an exact replica of the official DBpedia SPARQL endpoint both hosted in Virtuoso.
Note the LOD Cloud Cache hosted in Virtuoso does also include the DBpedia Datasets from the official DBpedia SPARQL endpoint along with most of the major datasets in the LOD Cloud, and hosted on a larger Virtuoso clustered server allowing a maximum of 100000 rows per query.