How to use ScalarDB in a way that data is stored in DynamoDBLocal? - scalardb

I believe ScalarDB supports DynamoDB since Version 3.0.0. I would like to use ScalarDB in the form of storing data in DynamoDBLocal. The reason for this is that when we develop as a team, we prepare test data to validate the source code implemented in the local environment, and we want to develop without mixing in test data used by other engineers on the same team while working. We are also concerned that if we store the test data during development in the local environment in AWS DynamoDB, we will incur AWS DynamoDB costs for that.
Is there any way to use ScalarDB with DynamoDBLocal?

From Scalar DB 3.1, we introduce a configuration "scalar.db.dynamo.endpoint-override" that overrides the endpoint with which the DynamoDB SDK should communicate. You can set this configuration to your DynamoDB Local endpoint to use DynamoDB Local.
Schema Tool for Scalar DB also supports the "--endpoint-override" option to use DynamoDB Local from 3.1.
Scalar DB 3.1:
https://github.com/scalar-labs/scalardb/releases/tag/v3.1.0
https://search.maven.org/artifact/com.scalar-labs/scalardb/3.1.0/jar

The CI for Scalar DB on DynamoDB actually uses DynamoDB Local so I think you can simply run your Scalar DB applications on it.

Related

GCP Dataflow vs Cloud Functions to automate scrapping output and file-on-cloud merge into JSON format to insert in DB

I have two sources:
A csv that will be uploaded to a cloud storage service, probably GCP Cloud Storage.
The output of a scrapping process done with Python.
When a user updates 1) (the cloud stored file) an event should be triggered to execute 2) (the scrapping process) and then some transformation should take place in order to merge these two sources into one in a JSON format. Finally, the content of this JSON file should be stored in a DB of easy access and low cost. The files the user will update are of max 5MB and the updates will take place once weekly.
From what I've read, I can use GCP Cloud Functions to accomplish this whole process or I can use Dataflow too. I've even considered using both. I've also thought of using MongoDB to store the JSON objects of the two sources final merge.
Why should I use Cloud Functions, Dataflow or both? What are your thoughts on the DB? I'm open to different approaches. Thanks.
Regarding de use of Cloud Functions and Dataflow. In your case I will go for Cloud Functions as you don't have a big volume of data. Dataflow is more complex, more expensive and you will have to use Apache Beam. If you are confortable with python and having into consideration your scenario I will choose Cloud Functions. Easy, convenient...
To trigger a Cloud Functions when Cloud Storage object is updated you will have to configure the triggers. Pretty easy.
https://cloud.google.com/functions/docs/calling/storage
Regarding the DB. MongoDB is a good option but if you wanth something quick an inexpensive consider DataStore
As a managed service it will make your life easy with a lot of native integrations. Also it has a very interesting free tier.

How would you achieve local data persistence in Flutter when remote versions of the same data are returned as nested JSON objects?

When the server stores data in a MongoDB database and is accessed through GraphQL, it would be cool if local/cached versions of the same data could be stored similarly - in some sort of local NoSQL data store.
However, from my research it looks like there aren't that many data persistence options available in Flutter and the best one available is SQFLite. If I use SQFLite, though, I have to wrangle different formats of the same data - the nested-object NoSQL/GraphQL format and the "separate objects joined through relations" format of SQL.
Has anyone dealt with this before? Even if you're not using MongoDB/GraphQL in your remote backend, your API likely still returns nested objects which can't be stored as-is in your local SQL DB and can't be used interchangeably with their locally persisted versions.
So how would you deal with this issue and achieve clean syncing of local and remote data without it turning into a mess?

How do I integrate MongoDB and Hazelcast without mentioning a specific schema?

I am aware of how we create POJO classes (Java) and map them to the schema of the data in MongoDB, and create a connection with spring data. But if I don't have a specific schema and I want to have MongoDB as a back end for my cache in Hazelcast, how do I do that? In my use-case specifically, I have a cache which needs to keep mongodb updated with whatever updates it comes across.
Check this out:
https://github.com/hazelcast/hazelcast-code-samples/tree/master/hazelcast-integration/mongodb
Do note that this is a sample code that is meant for referential purposes only, do not copy paste into your production system.

What is the downside of using localDisk as storage means in Sails project

I am trying to decide what database to use in a Sails project. I started with localDisk and it works fine. I wonder why a database like Postgres or Mongo is needed. Could someone explain to me?
Also since waterline abstracted the underlying database, what is the difference between those underlying databases, such as Postgres, Mongo and Redis?
On question# 1:
I am quoting from balderdashy about sails-disk
Functions as a persistent object store which works great as a bundled, starter database (with the strict caveat that it is for non-production use only). [Reference]
While databases like MongoDB, PostgreSQL, MySQL etc. provides you the reliability to use them in production, sails-disk tells you not to use it in production. Reason? sails-disk is not designed to handle production related issues. So, you can use sails-disk if you have very small database and performance is not an issue to you. Otherwise you can't rely on sails-disk.
On question# 2:
If you use waterline ORM then your queries would be same independent of the underlying database you use. That's the purpose of ORMs (Object Relational Mapping). But the performance of query execution would be highly dependent on the architecture of your database design, the load of queries. So you have to choose the database engine to use depending the scenario your application would handle.

Is there a way to persist HSQLDB data?

We have all of our unit tests written so that they create and populate tables in HSQL. I want the developers who use this to be able to write queries against this HSQL DB ( 1) by writing queries they can better understand the data model and the ones not as familiar with SQL can play with the data before writing the runtime statements and 2) since they don't have access to the test DB/security reasons). Is there a way to persist the results of the test data so that it may be examine and analyzed with a an sql client?
Right now I am jury rigging it by switching the data source to a different DB (like DB2/mysql, then connecting to that DB on my machine so I can play with persistant data), however it would be easier for me if HSQL supports persisting this than to explain how to do this to every new developer.
Just to be clear, I need an SQL client to interact with persistent data, so debugging and checking memory won't be clean. This has more to do with initial development and not debugging/maintenance/testing.
If you use an HSQLDB Server instance for your tests, the data will survive the test run.
If the server uses a jdbc:hsqldb:mem:aname (all-in-memory) url for its database, then the data will be available while the server is running. Alternatively the server can use a jdbc:hsqldb:file:filepath url and the data is persisted to files.
The latest HSQLDB docs explain the different options. Most of the observations also apply to older (1.8.x) versions. However, the latest version 2.0.1 supports starting a server and creating databases dynamically upon the first connection, which can simplify testing a lot.
http://hsqldb.org/doc/2.0/guide/deployment-chapt.html#N13C3D