I am currently working on a project with InfluxDB, where I have to track and store several measurements. My issue: For all my measurement I also need to store some additional relational metadata (for reporting), which only change once or twice a week. Moreover I have some config data, for my applications..
So my question: Do you have any suggestions or experience with storing time-series and relational data like that? Should I try to store everything (even the relational data) within the InfluxDB or should I use an external relational DB for that?
Looking forward to your help!
Marko
Related
i have a problem where in i need to store the users's address data which can come from different vendors in different format. once i have the data i need to do some cleaning and wrinkling and run the de-duplication process to get the clean structured data. once the data is clean, i may have to pick the different attributes of address from different vendors based on some complex logic which is not defined yet. my question is
1) which database i should use i.e. NOSQL database family like document/keyvalue/dynamoDB etc or RDBMS with MPP database like redshift or azure data warehouse
2) NOSQL DB like mongoDB provide the flexibility of schema but at the same time the queries or de-duplication process is not something inbuilt for these databases.
if anyone can guide me on this i shell be very thankful for him
Thanks
Atul
I have a database currently in Mongo running on an EC2 instance and would like to migrate the data to DynamoDB. Is this possible and what is the most cost effective way to achieve this?
When you ask for a "cost effective way" to migrate data, I assume you are looking for existing technologies that can ease your life. If so, you could do the following:
Export your MongoDB data to a text file, say in tsv format, using mongoexport.
Upload that file somewhere in S3.
Import this data, in S3, to DynamoDB using AWS Data Pipeline.
Of course, you should design & finalize your DynamoDB table schema before doing all this.
Whenever you are changing databases, you have to be very careful about the way you migrate data. Certain data formats maintain type consistency, while others do not.
Then there are just data formats that cannot handle your schema. For example, CSV is great at handling data when it is one row per entry, but how do you render an embedded array in CSV? It really isn't possible, JSON is good at this, but JSON has its own problems.
The easiest example of this is JSON and DateTime. JSON does not have a specification for storing DateTime values, they can end up as ISO8601 dates, or perhaps UNIX Epoch Timestamps, or really anything a developer can dream up. What about Longs, Doubles, Ints? JSON doesn't discriminate, it makes them all strings, which can cause loss of precision if not deserialized correctly.
This makes it very important that you choose the appropriate translation medium. The generally means you have to roll your own solution. This means loading up the drivers for both databases, reading an entry from one, translating, and writing to this other. This is the best way to be absolutely sure errors are handled properly for your environment, that types are kept consistently, and that the code properly translates schema from source to destination (if necessary).
What does this all mean for you? It means a lot of leg work for you. It is possible somebody has already rolled something that is broad enough for your case, but I have found in the past that it is best for you to do it yourself.
I know this post is old, Amazon made it possible with AWS DMS, check this document :
https://docs.aws.amazon.com/dms/latest/userguide/CHAP_Source.MongoDB.html
Some relevant parts:
Using an Amazon DynamoDB Database as a Target for AWS Database
Migration Service
You can use AWS DMS to migrate data to an Amazon DynamoDB table.
Amazon DynamoDB is a fully managed NoSQL database service that
provides fast and predictable performance with seamless scalability.
AWS DMS supports using a relational database or MongoDB as a source.
I'm working on a project for uni, that is building a URL shortener. I've studied the different types of NoSQL databases, but I can't figure out which is better for my purpose and why.
I can choose between a key/value db, document-oriented, column-oriented or graph. I'm sure the graph one is not good for my goal.
Do you have any suggestions please?
For a URL shortener, you'll not need a document store --- the data is too simple.
You'll not need a column store --- columns are for sorting and searching multiple attributes, like finding all Wongs in Hong Kong.
You'll not need a graph DB --- there's no graph.
You want a key/value DB. Some that you'll want to look at are the old standard MemCache, Redis, Aerospike, DynamoDB in EC2.
An example of a URL shortener, written in Node, for AerospikeDB, can be found in this github repo - it's just a single file - and the technique can be applied to other key-value systems.
https://github.com/aerospike/url-shortener-nodejs
so far I know ArangoDB uses MVCC and therefore it creates revisions of nodes and edges for a undefined period of time until the garbage collector removes them.
I would like to implement a graph database schema and I need to keep the state of this database at specific times. This means I will configures times when the database management system take a snapshot of the state (e.g. every week).
So my question in short: is it possible to keep the revisions/versions of nodes/edges in arangodb (or maybe with a plugin) and a timestamp of their creation?
If no, is there a other graph databases which is able to do this?
I think you can use arangodump (link to ArangoDB client tools manual) binary to create a snapshot at the desired point in time.
This will save the state of the database (or just the specific collections that contain your graph data) to JSON files, which can be used for auditing or later reloading the data.
arangodump is contained in the ArangoDB distributions.
The data dumped by arangodump will not contain any creation timestamps, but if you need them you can make them part of your data by just filling a "created" attribute in each node / edge when you create it.
I hope this helps.
One thing I have in mind is, that datasets in Core Data (or lets say: managed objects) have no ID like known from other databases such as MySQL. Also, they're not in a specific guaranteed order.
What else makes Core Data much more "special" compared to working with a relational database like MySQL? Besides the whole object graph persisting and ORM stuff?
This is a good article explaining the main differences. The biggy for me is
'Core Data cannot operate on data
without loading the data into memory'
This alone makes core-data and MySQL suited to totally different tasks.
The big difference I would say is that Core Data is built on an ORM, an Object Relational Mapping, while MySQL is just a relational database. You could actually host CoreData on MySQL if Apple wanted to let you. Instead they use a different embedded SQLight solution or an XML representation depending on what you want for your backing store.