Using Sails.js with AWS DynamoDB....not ideal - sails.js

I started working on a small POC and decided to give Sails.js a try :)
Part of the POC we wanted to use DynamoDB since the project will eventually involve high scalability and we're not looking to hire full-time MongoDB expert at this point.
We used the module: https://github.com/gadelkareem/sails-dynamodb
Problem is there is no documentation and the module does not even work...
It seems the sails ORM is not ideal for DynamoDB and requires writing custom DB services. Does anyone have experience with this?
I was very excited to come across Sails but if it won't let us play nice with DynamoDB then it might very well be out as an option to us....
Anyone have experience with this or maybe something I'm missing?

One of the important plus of vogels is excellent documentation.
Sails-dynamodb adapter based on the vogels, but not all features are implemented in sails-dynamodb adapter. For example, vogels has Expression Filters.
Vogels able to create tables. Adapter can't. An adapter needs duplication table schema in sails files and dynamodb shell.
Vogels has some own types, such as uuid type, StringSet, NumberSet, TimeUUID. (Adapter can use it too, if includes Vogels and Joi lib)
Vogels and adapter have the same query (create, update, delete, find) capabilities.
Adapter allows without changing the code switch to another data base. Adapter encapsulates establishment of connection to database.
Conclusion - for most purposes this adapter is suitable for the work and do not need to work directly with the Vogels

Sails comes loaded with an ORM called "Waterline". There are some official waterline plugins such as mongodb, postgresql, mysql and then there are some unofficial ones created by the community. I'd assume right now that Dynamo is in the latter category since I have not come across it before. However, with that being said I would not take this experience as a reason to ditch Sails.js.
Sails.js is built with the intention that all of its components can be swapped out, this means you are not tied to a specific template engine, authentication libraries etc. and including your ORM choice.
Waterline is still being actively developed but it is sat at v0.12.1 as of writing this response. It isn't fully there yet so there will be the odd issues still around!
My recommendation? Take a look at swapping out waterline for a different ORM. Keep the flexibility Sails gives you and change out the component that doesn't meet your criteria. There are still many benefits to Sails you can utilise.
Vogels might be worth checking out: https://github.com/ryanfitz/vogels
Turning off waterline: Is there a way to disable waterline and use a different ORM in sails.js?

Related

Distributed database which allows custom CRDT merging

I‘m rather new to distributed databases, though I have already studied related literature (e.g. CAP theorem, CRDT) and implemented some POC to allow scaling my application horizontally.
Now I however face a challenging problem. In ordere to scale the app horizontally, communication between services is done via a distributed queue. As a background here, I do require a custom CRDT method to keep the data eventually consistent, and I do require my application to work like a cache (remotely related to REDIS).
The challenge is now that I also need to persist the data. That requires me to keep the data within the application cache and database eventually consistent. I‘ve checked Cassandra, I saw a ticket [1] where somebody tried to add functionality for custom CRDT merge functionality (which as I mentioned do require for a reason). That never made it into Cassandra, and seems to have a few issues to resolve.
What are my options, either in form of a concrete distributed database engine allowing custom merging, or an algorithm that could help solve the problem (e.g. in form of a db trigger or something like this).
[1] https://issues.apache.org/jira/browse/CASSANDRA-6412
As far as I know, there are very few databases that allow you to specify your own custom conflict resolution algorithms. Tbh. the only one I really found - disclaimer: I'm not a Microsoft Advocate - is Azure CosmosDB. It has MongoDB-compatible API and can be configured to use master-master replication strategy, where you need to specify your own conflict resolution algorithm (using JavaScript). You can use it to define your own merge operation.
If you'll take a look outside of database-native solutions into application-level ones, there are several tools, like ie. Akka (available in both JVM or .NET version) which enables you to write custom CRDTs inside of distributed-data module. JVM version additionally supports multi-datacenter persistence, which is conceptually closer to how commutative CRDTs work and can be integrated with Cassandra backend.
I've implemented a MerkleClock CRDT at my merkle-crdt repository.
You could use an approach that when you update the database record column, you fetch the column's value and then you merge it with your CRDT of your current state and then when you save, you serialise the CRDT as JSON and store it in the database.

NoSQL Database for Blog / Content Management System? (MongoDB / Cassandra)

My company has been used Oracle for a long time but we would like to look for a NoSQL database as a replacement for faster querying and flexible schema design.
I have tried to use MongoDB which would be the most popular NoSQL database nowadays. I connected it to Spring Data to do some simple queries, which is quite easy to be set up and code simply. Since we are using Spring MVC for web development, Spring Data seems quite suitable for integration.
However, I heard that Cassandra would have better performance in write and read, especially in large scaling system. I am not sure whether it is worth to move to Cassandra and not sure how to measure the performance between MongoDB and Cassandra.
Here are some requirements for my system:
focusing on article fetching
tagging for articles for users to easily search for their favors or related articles
non-distributed system, but have load-balancing and fail-over
Java based, Spring MVC for web development
articles would be stored as XML
probably provide user-defined tables (collections) and fields (keys)
Therefore I would like to raise some questions:
Which Database is the most suitable for my case? You may also raise other databases apart from MongoDB and Cassandra.
If I use Cassandra, which framework would be suitable for integrating to Spring MVC?
Thank you so much in advanced.
I have experience using Spring and Cassandra together. But I always have written my own data access layer.
Using the ORMs out there for Cassandra will not allow you to leverage its full power, and you will, most likely, introduce bugs because your SQL background will make you expect certain behaviours that are just not what Cassandra will give you.
My advice write the code that will access Cassandra yourself and do not be afraid to denormalize A LOT. Think more about how you want to query (or find it) your data than the format in which you want to save it.
I also strongly recommend reading this amazing article: Cassandra Data Modeling Best Practices part 1 part 2
Another DB which might suit your application better is CouchDB (I like using BigCouch). It is another Document based NoSQL database and is in my opinion superior to MongoDB. It offers better solution for scaling and gives emphasis to Availability (just like Cassandra).
I'd like to point you to this question about the difference between CouchDB and MongoDB.
As far as framework goes Play framework has a lot of plugin to work with NoSQL systems, so you might give it a try. You could try playorm which is the last I experimented on.
EDIT : I forgot to mention Kundera as well as an ORM for Cassandra
Choosing between Cassandra and MongoDB depends on type of storage. MongoDB is primarily for document based storage where you get an edge by having various sql like features.
If you require columnar database with high availability and multi dc replication? go for Cassandra.
http://db-engines.com/en/system/Cassandra%3BHBase%3BMongoDB

How to have complete offline functionality in a web app with PostgreSQL database?

I would like to give a web app with a PostgreSQL database 100% offline functionality. In an ideal case the database should be completely replicated in the browser per user, and synchronized when online. So that the same code can be used to talk to both the offline and online database. I know this is possible with PouchDB and CouchDB, but have not found a solution that works with PostgreSQL. Is this at all possible?
Short answer: I don't know of anything like this that currently exists.
However, in theory, this could be made to work...(long answer:)
Write a PostgreSQL backend for levelup (one exists for MySQL: https://github.com/kesla/mysqldown)
Wire up pouch-server to read/write from your PostgreSQL db using pouchdb's existing leveldb adapter (which in turn will have to be configured to use your postgres backend). Congrats, you can now sync data using PouchDB!
Whether an approach like this is practical in reality for your application is a different question you'll have to answer.
You may be wondering, for example, "will I be able to sync an existing complex schema with multiple tables to the client with this approach?" The answer is probably not - the mysqldown implementation of leveldown uses a single MySQL table with three fields: id, key, and value (source), and I imagine any general-purpose PostgreSQL adapter would be similar (nothing says you can't do a special-purpose adapter just for your app though!).
On the other hand, if you were to implement a couchdb-compatible API (or a subset- you may not need attachments, for example) over your existing database schema, there's nothing stopping you from using PouchDB on the client to talk directly to that as if it were an actual CouchDB - just pop in the URL and call replicate()! Implementing the replication protocol might be a fair bit of work, since you'd need to track revisions and so on somewhere - but again, technically not impossible!
There are also implementations of levelup's backend storage that are designed for browsers. See level.js, which could be another way to sync between a server-side Postgres levelup backend and the browser.
TL;DR: There's tons of work being done around Javascript databases right now. Is syncing with Postgres impossible? probably not. Would it be a lot of work? Definitely. Worth it? Who knows, but it would be cool.
Without installing PostgreSQL on the client? No. Obviously you can cache data for offline use, but an entire RDBMS+procedural languages in Javscript, no.

User Profiles in a 3 tiered CouchDB and Laravel app

I'm developing a web application in Laravel 4 that has multiple users, each one with a profile. These profiles may (or may not) have different variables, such as dates, etc. I've looked at some RDBMS solutions for this problem including EAV design, but this method is really expensive in terms of both performance and code, so I discarded it.
The other option was tocreate a huge table with many columns, but I also discarded it since it is pretty useless to have a user with 30 null fields.
So I was thinking about using NoSQL and I ended up with CouchDB for the scalability and master-master replication. I looked into some DBaaS and found Cloudant quite interesting. The real question is: Is it really the option for my case (having users with a profile with different variables for each user)? Or is it doable with a RDBMS (MySQL)? Also, How can I use CouchDB with Laravel in a 3 tiered app? I don't want the user to have access to neither of the database's features, I really think that using CouchDB on the server side is the best option so far, please correct me if I'm wrong.
Having non-homogenous data like you is the typical case where document-oriented databases like CouchDB or MongoDB are better than relational databases.
While CouchDB can be configured to be public-facing and being directly accessible by the clients (there are even proof-of-concepts of whole web applications served solely by CouchDB), this is quite uncommon in practice. The usual scenario is to use CouchDB as a hidden backend service which is used by a webserver running some kind of server-sided web technology like PHP, JSP, ASP or whatever you prefer.

iPhone app with Google App Engine

I've prototyped an iPhone app that uses (internally) SQLite as its data base. The intent was to ultimately have it communicate with a server via PHP, which would use MySQL as the back-end database.
I just discovered Google App Engine, however, but know very little about it. I think it'd be nice to use the Python interface to write to the data store - but I know very little about GQL's capability. I've basically written all the working database code using MySQL, testing internally on the iPhone with SQLite. Will GQL offer the same functionality that SQL can? I read on the site that it doesn't support join queries. Also is it truly relational?
Basically I guess my question is can an app that typically uses SQL backend work just as well with Google's App Engine, with GQL?
I hope that's clear... any guidance is great.
True, Google App Engine is a very cool product, but the datastore is a different beast than a regular mySQL database. That's not to say that what you need can't be done with the GAE datastore; however it may take some reworking on your end.
The most prominent different that you notice right off the start is that GAE uses an object-relational mapping for its data storage scheme. Essentially object graphs are persisted in the database, maintaining there attributes and relationships to other objects. In many cases ORM (object relational mappings) map fairly well on top of a relational database (this is how Hibernate works). The mapping is not perfect though and you will find that you need to make alterations to persist your data. Also, GAE has some unique contraints that complicate things a bit. One contraint that bothers me a lot is not being able to query for attribute paths: e.g. "select ... where dog.owner.name = 'bob' ". It is these rules that force you to read and understand how GAE data store works before you jump in.
I think GAE could work well in your situation. It just may take some time to understand ORM persistence in general, and GAE datastore in specifics.
GQL offers almost no functionality at all; it's only used for SELECT queries, and it only exists to make writing SELECT queries easier for SQL programmers. Behind the scenes, it converts your queries to db.Query objects.
The App Engine datastore isn't a relational database at all. You can do some stuff that looks relational, but my advice for anyone coming from an SQL background is to avoid GQL at all costs to avoid the trap of thinking the datastore is anything at all like an RDBMS, and to forget everything you know about database design. Specifically, if you're normalizing anything, you'll soon wish you hadn't.
I think this article should help you.
Summary: Cloud computing and software development for handheld devices are two very hot technologies that are increasingly being combined to create hybrid solutions. With this article, learn how to connect Google App Engine, Google's cloud computing offering, with the iPhone, Apple's mobile platform. You'll also see how to use the open source library, TouchEngine, to dynamically control application data on the iPhone by connecting to the App Engine cloud and caching that data for offline use.
That's a pretty generic question :)
Short answer: yes. It's going to involve some rethinking of your data model, but yes, changes are you can support it with the GAE Datastore API.
When you create your Python models (think of these as tables), you can certainly define references to other models (so now we have a foreign key). When you select this model, you'll get back the referencing models (pretty much like a join).
It'll most likely work, but it's not a drop in replacement for a mySQL server.