Cold storage in cloud search

Cold storage in cloud search - opensearch

I see that there is Cold storage available in Open search https://docs.aws.amazon.com/opensearch-service/latest/developerguide/cold-storage.html which costs less and does not impact the performance much.
What about cloud search? I don't see any cold storage for it. My question is - Is it a missing feature in Cloudsearch or is it not required in Cloudsearch? Any alternatives for it other than switching to Opensearch?

Related

Horizontally Scaling Database Guide

We want to horizontally scale our existing MongoDB database which is running on one server. Due to increased user base, we can't scale it vertically anymore. We need to scale it horizontally through sharding.
The MongoDB provides a good tutorial to achieve Sharding. But, we need to do it in less amount of time. We are not expert on this.
It seems there are multiple options available like Google Cloud and Amazon RDS. All we want is to use our database but achieve Sharding by some another service.
So my questions are:
1. Is it possible to build a fail-safe cluster architecture is less than a week using MongoDB Sharding with the team having no prior experience in this?
2. If not, do these services like Google cloud SQL and Amazon RDS provide a mechanism to use our database with their Sharding service?
Can anyone with expertise in this just guide me in this direction?

I tried MongoDB Atlas and it looks pretty good https://www.mongodb.com/cloud/atlas
It creates a cluster for you by default
Maybe, you can give it a try:
MongoDB Atlas delivers the world’s leading database for modern
applications as a fully automated cloud service engineered and run by
the same team that builds the database. Proven operational and
security practices are built in, automating time-consuming
administration tasks such as infrastructure provisioning, database
setup, ensuring availability, global distribution, backups, and more.
The easy-to-use UI and API let you spend more time building your
applications and less time managing your database.

google compute engine files for HA LAMP are not there for download?

I found this guide in the google documentation for the cloud platform.
https://cloud.google.com/developers/articles/high-availability-lamp-stack-on-google-compute-engine
although the files it asks you to download, are not found when the link is clicked?
anyone know where they actually are?

Unfortunately we deprecated this sample application as it was focused on migrating applications around maintenance windows.
Now that we have live migration and transparent maintenance windows, developers no longer need to move components of their application around zones due to impeding maintenance. Additionally, Cloud SQL now supports MySQL wire protocol which significantly reduces the complexities of managing applications for high availability.
In the future we may develop a new application, but it will be greatly simplified since we can offload load balancing to Compute Engine load balancer and persistency to Cloud Datastore and Cloud SQL.

What are the pros and cons of DynamoDB with respect to other NoSQL databases?

We use MongoDB database add-on on Heroku for our SaaS product. Now that Amazon launched DynamoDB, a cloud database service, I was wondering how that changes the NoSQL offerings landscape?
Specifically for cloud based services or SaaS vendors, how will using DynamoDB be better or worse as compared to say MongoDB? Are there any cost, performance, scalability, reliability, drivers, community etc. benefits of using one versus the other?

For starters, it will be fully managed by Amazon's expert team, so you can bet that it will scale very well with virtually no input from the end user (developer).
Also, since its built and managed by Amazon, you can assume that they have designed it to work very well with their infrastructure so you can can assume that performance will be top notch. In addition to being specifically built for their infrastructure, they have chosen to use SSD's as storage so right from the start, disk throughput will be significantly higher than other data stores on AWS that are HDD backed.
I havent seen any drivers yet and I think its too early to tell how the community will react to this, but I suspect that Amazon will have drivers for all of the most popular languages and the community will likely receive this well - and in turn create additional drivers and tools.

Using MongoDB through an add-on for Heroku effectively turns MongoDB into a SaaS product as well.
In reality one would be comparing whatever service a chosen provider has compared to what Amazon can offer instead of comparing one persistance solution to another.
This is very hard to do. Each provider will have varying levels of service at different price points and one could consider the option of running it on their own hardware locally for development purposes a welcome option.

I think the key difference to consider is MongoDB is a software that you can install anywhere (including at AWS or at other cloud service or in-house) where as DynamoDB is a SaaS available exclusively as hosted service from Amazon (AWS). If you want to retain the option of hosting your application in-house, DynamoDB is not an option. If hosting outside of AWS is not a consideration, then, DynamoDB should be your default choice unless very specific features are of higher consideration.

There's a table in the following link that summarizes the attributes of DynamoDB and Cassandra:
http://www.datastax.com/dev/blog/amazon-dynamodb
Something that needs improvement on DynamoDB in order to become more usable is the possibility to index columns other than the primary key.
UPDATE 1 (06/04/2013)
On 04/18/2013, Amazon announced support for Local Secondary Indexes, which made DynamoDB f***ing great:
http://aws.amazon.com/about-aws/whats-new/2013/04/18/amazon-dynamodb-announces-local-secondary-indexes/

I have to be honest; I was very excited when I heard about the new DynamoDB and did attend the webinar yesterday. However it's so difficult to make a decision right now as everything they said was still very vague; I have no idea the functions that are going to be allowed / used through their service.
The one thing I do know is that scaling is automatically handled; which is pretty awesome, yet there are still so many unknowns that it's tough to really make a great analysis until all the facts are in and we can start using it.
Thus far I still see mongo as working much better for me (personally) in the project undertaking that I've been working on.
Like most DB decisions, it's really going to come down to a project by project decision of what's best for your need.
I anxiously await more information on the product, as for now though it is in beta and I wouldn't jump ship to adopt the latest and greatest only to be a tester :)

I think one of the key differences between DynamoDB and other NoSQL offerings is the provisioned throughput - you pay for a specific throughput level on a table and provided you keep your data well-partitioned you can always expect that throughput to be met. So as your application load grows you can scale up and keep you performance more-or-less constant.

Amazon DynamoDB seems like a pretty decent NoSQL solution. It is fast, and it is pretty easy to use. Other than having an AWS account, there really isn't any setup or maintenance required. The feature set and API is fairly small right now compared to MongoDB/CouchDB/Cassandra, but I would probably expect that to grow over time as feedback from the developer community is received. Right now, all of the official AWS SDKs include a DynamoDB client.

Pros
Lightning Fast (uses SSDs internally)
Really (really) reliable. (chances of write failures are lower)
Seamless scaling (no need to do manual sharding)
Works as webservices (no server, no configuration, no installation)
Easily integrated with other AWS features (can store the whole table into S3 or use EMR etc)
Replication is managed internally, so chances of accidental loss of data is negligible.
Cons
Very (very) limited querying.
Scanning is painful (I remember once a scanning through Java ran for 6 hours)
pre-defined throughput, which means sudden increase beyond the set throughput will be throttled.
throughput is partitioned as table is sharded internally. (which means if you had a throughput for 1000 and its partitioned in two and if you are reading only the latest data(from one part) then your throughput of reading is 500 only)
No joins, Limited indexing allowed (basically 2).
No views, triggers, scripts or stored procedure.
It's really good as an alternative to session storage in scalable application. Another good use would be logging/auditing in extensive system. NOT preferable for feature rich application with frequent enhancement or changes.

Software Configuration Management in the cloud?

I'm a CS student, just exploring the SCM space. While doing my own research I came across many different hosted solutions (GitHub obviously, Lighthouse, YouTrack, TeamCity, etc.) - do you think it is actually reasonable to try to host a (commercial, closed source) project entirely in the cloud?
Let's say I'd host code on GitHub, use Jira or Lighthouse for issue tracking, God knows what other hosted PM solution (Basecamp?) and build using EC2 (I can put Hudson or TeamCity on it and use appropriate EC2 plugins for these products to get more computing power when needed).
Is the EC2 bill going to kill me (compared to self-hosted solutions)? Do you think "the cloud" it's still not reliable enough?

This is the way we work at our company. Version control system (git) + agile planning + ticket system/bugtracker + wiki are hosted at http://www.assembla.com for 49$/month for 40 users, private repositories ( https://www.assembla.com/plans ) and we have a micro instance on amazon aws ec2 where jenkins, nexus, sonar and some internals tools are running for free the first year and then you should consider spending like 80$/month for the same service.
So it costs 129$/month for a full cloud solution for a small company (40 users max): reliable, with a good release train of new features by our service providers and with a low maintenance footprint for us.
Compared to self hosted it's not really expensive considering the following costs :
- price of your server (lets say 1000$)
- electricity bills (lets say 30$/month for 100% uptime)
- cost of configuration (to get the same quality as assembla for exemple) and maintenance (lets say 0.5day man per month at 500$/day in france)
The cost is : 363$/month
This should look a bit biased, but finally it's what we experienced.
Regards,
Xavier

There is no problem to use the cloud for hosting, and many large companies do so already. I think NetFlix recently moved solely to EC2. Our whole business runs on EC2, and it's been relatively good so far.
The EC2 bill is up to you to manage -- cloud is all about granular billing for services, and the more you consume the more you pay (we sell a tool that helps with cost controls: http://LabSlice.com). Your biggest cost will usually be CPU power, so stick to the Micro/Small instances until you've got a handle on costs.
It's interesting that people question the reliability of cloud, as the underlying premise is actually to provide more reliability to businesses then they could afford themselves (high scalability, immediate availability of hardware, monitoring, load balancing etc.)

You can make use of AWS Free Account and host your application. If you exceed Free Account Usage limit,you will be charged for whatever extra you have utilized.
Regarding reliability about Cloud, every big firm is moving towards Cloud like Amazon,Microsoft,IBM,HP etc because they found cloud reliable,cost effective and green.

Given your a student and assuming your looking to spend little money, a lot of Git and SVN hosting providers offer free hosting for students or free accounts if your a small team with minimal storage requirements. Check out Codesion's student offering for example (disclaimer, I work for Codesion). This plan also comes with Trac / Bugzilla for your PM requirements. I wouldn't be concerned with security and reliability for the same reason that Simon points out above.
As for CI on EC2 - this is probably your best bet since you pay by the hour each instance is running. I'd recommend using Amazons API to fire up an instance each time Hudson needs to perform a build, store the results of the build on more permanent storage, and shut the instance down when finished. If your doing a lot of CI builds, it might be better to just keep the instance running, but this will cost you more of course.

Cloud Computing need some regulations?

I was involved in couple of cloud computing platform recently.
First of all please note that I am not trying to criticize any platform.
Cloud computing is large area but to make my point simple and understandable. Let me come up with very simple scenario and that is data storage services hosted on the cloud.
If you take any storage service like Amazon EC2, SQL Data Service(SDS), Salesforce.com services.
If you want to consume any of such data storage service platform goal of all such service are same and that is to serve requested data on demand. Without warring about how it store and where it stored and who is maintaining it etc... (all cloud goodies)
Now my area of concern is the way ANSI-SQL regulated platform venders to make sure they follow similar language across all the product can’t they regulate similar concept across
service providers?
Why no such initiatives??
Any thoughts appreciated

It seems to me like you're worried about vendor lock-in with cloud computing. I may be naive but I would normally choose technologies and then go look for cloud vendors that'd be able to deliver these technologies. And if I was aiming for a "write once run anywhere approach" I'd have to select technology that'd make this as realistic as possible.
With the fairly rapid speed of development I really think standardization committees would struggle to keep up. ANSI-SQL has had 20 + years of history. It seems to me like you're requesting for standardization long before we even know what the cloud is up to....

I think that this emerging cloud computing initiative is just too young in order to have standards.
Service providers right now just worry about rushing into the market, rather than interoperability and standards.
Later on, when the situation is more established, some common guidelines may emerge. But there is still a long way to go.

You seem to be asking specifically about cloud storage services, rather than cloud computing in general. So your Amazon example would be S3, not EC2.
I think the field is a little young to be standardising on an API just yet. The services differentiate themselves in ways which rule this out. For example, S3 trades sophistication for scalability/reliability/performance: you can't do a complex SQL LIKE query. You can store and retrieve blobs of data based on a key, and that's about it.
I think as such services become more and more the mainstream way to do things, standards will emerge. Users will want the freedom to switch providers on a whim, move their data around, test against free local storage, etc.
The APIs used are all based on Web Standards already. Making an abstraction layer to make them look the same is fairly trivial.