Which NoSQL database should I use for a URL shortener? - key-value

I'm working on a project for uni, that is building a URL shortener. I've studied the different types of NoSQL databases, but I can't figure out which is better for my purpose and why.
I can choose between a key/value db, document-oriented, column-oriented or graph. I'm sure the graph one is not good for my goal.
Do you have any suggestions please?

For a URL shortener, you'll not need a document store --- the data is too simple.
You'll not need a column store --- columns are for sorting and searching multiple attributes, like finding all Wongs in Hong Kong.
You'll not need a graph DB --- there's no graph.
You want a key/value DB. Some that you'll want to look at are the old standard MemCache, Redis, Aerospike, DynamoDB in EC2.
An example of a URL shortener, written in Node, for AerospikeDB, can be found in this github repo - it's just a single file - and the technique can be applied to other key-value systems.
https://github.com/aerospike/url-shortener-nodejs

Related

InfluxDB and relational metadata?

I am currently working on a project with InfluxDB, where I have to track and store several measurements. My issue: For all my measurement I also need to store some additional relational metadata (for reporting), which only change once or twice a week. Moreover I have some config data, for my applications..
So my question: Do you have any suggestions or experience with storing time-series and relational data like that? Should I try to store everything (even the relational data) within the InfluxDB or should I use an external relational DB for that?
Looking forward to your help!
Marko

RDB and RDF mapping - how to?

I have 3 Relational DBs, I would like to map a graph database so we can manage ontologies and query across them, the databases must remain as RDBs and the RDF mapping should allow querying via SPARQL. While I understand the theory I'm struggling to find a walk through guide, and help appreciated.
I think what you are looking for is R2RML.

Auto complete on Large data set

I'm writing a project where I need to do an autocomplete on a data set that has 5 milion objects (schema is different for objects).
My first thought was to do SQL, but since Schema is changing it will not be fast
So I thought about MongoDB.
Two questions:
1 - do you have sample code that's working that I can use?
2- is Mongo the best solution in place? will it be fast? is there another NoSQL database that I can use instead?
If the time is critical and you wish to have the fastest database than Redis may be the database you are looking for. Here is a link to the Auto complete blog post using Redis.
MongoDB is a great database and includes many great feature so it may be a good choice either.

MongoDB to DynamoDB

I have a database currently in Mongo running on an EC2 instance and would like to migrate the data to DynamoDB. Is this possible and what is the most cost effective way to achieve this?
When you ask for a "cost effective way" to migrate data, I assume you are looking for existing technologies that can ease your life. If so, you could do the following:
Export your MongoDB data to a text file, say in tsv format, using mongoexport.
Upload that file somewhere in S3.
Import this data, in S3, to DynamoDB using AWS Data Pipeline.
Of course, you should design & finalize your DynamoDB table schema before doing all this.
Whenever you are changing databases, you have to be very careful about the way you migrate data. Certain data formats maintain type consistency, while others do not.
Then there are just data formats that cannot handle your schema. For example, CSV is great at handling data when it is one row per entry, but how do you render an embedded array in CSV? It really isn't possible, JSON is good at this, but JSON has its own problems.
The easiest example of this is JSON and DateTime. JSON does not have a specification for storing DateTime values, they can end up as ISO8601 dates, or perhaps UNIX Epoch Timestamps, or really anything a developer can dream up. What about Longs, Doubles, Ints? JSON doesn't discriminate, it makes them all strings, which can cause loss of precision if not deserialized correctly.
This makes it very important that you choose the appropriate translation medium. The generally means you have to roll your own solution. This means loading up the drivers for both databases, reading an entry from one, translating, and writing to this other. This is the best way to be absolutely sure errors are handled properly for your environment, that types are kept consistently, and that the code properly translates schema from source to destination (if necessary).
What does this all mean for you? It means a lot of leg work for you. It is possible somebody has already rolled something that is broad enough for your case, but I have found in the past that it is best for you to do it yourself.
I know this post is old, Amazon made it possible with AWS DMS, check this document :
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
Some relevant parts:
Using an Amazon DynamoDB Database as a Target for AWS Database
Migration Service
You can use AWS DMS to migrate data to an Amazon DynamoDB table.
Amazon DynamoDB is a fully managed NoSQL database service that
provides fast and predictable performance with seamless scalability.
AWS DMS supports using a relational database or MongoDB as a source.

Jira using enterprise architecture by OfBiz

The 'open for business project' is an enterprise framework.
It so happens Jira uses this, and I was pretty shocked at how much work is involved to pull data for a particular entity (say a issue/bug in Jira's case).
Imagine getting a list of all the issues, it has to first get all the columns (or properties) to display for the table column, then pull in the values for each. For an enterprise solution this sounds like a sub-optimal solution (but I understand how it adds flexibility).
You can read how its used in Jira practically: http://confluence.atlassian.com/display/JIRA/Database+Schema
main site: http://ofbiz.apache.org/docs/entity.html
I'm just confused as to how to list all issues. Meaning, what would the sql queries look like?
Its one thing to pull a single issue, but to get a list you have to do allot of work to get the values. I don't think it can be done with a singl query using joins now can it?
(Disclaimer: I work for Atlassian, but I'm not on the JIRA team)
OFBiz EE is just an abstraction layer for moving between database tables and fancy maps called GenericValues. It has no influence over the database schema itself. Your real issue here seems to be that JIRA's database schema is complicated.
The reason it's complicated is because it has to support a data model where an issue is an arbitrary collection of arbitrary fields, at some point in an arbitrary workflow. The fields themselves can be defined by third-party plugins. It's very hard to produce a friendly-looking RDBMS schema to fit this kind of dynamic data model, and JIRA tries as best it can.
You can get information directly out of the database if you want, the database schema is documented in the link above, or you can go up a layer or twelve of abstraction and talk through one of JIRAs many APIs.
A good place to ask questions about getting data out of JIRA is the forums on http://forums.atlassian.com/
The entity engine used in jira is a database abstraction layer ( with a very rich and easy to use API ) that connects your application with one or more datasources. But the databases are still relational, so you can use SQL if you want to. About the issue info you want to pull I'd say it wouldn't be very easy only with joins. I'd recommend you use the scripting language of the RDBMS ( i.e. PL/SQL, pgPL/SQL ).
SELECT * FROM jiraissue;