Limiting database access within Quicksight - postgresql

We have 100 source client databases all with the same schema. We would like to stream this data into Kafka using Debezium and Sql Server. We want to continue to maintain data separation with the data when it is brought into Postgres and Quicksight. So we need a firewall mechanism to prevent data for a client being viewed by other clients when it is viewed in Quicksight.
What ways can you manage the safety of topics in Kafka and into Postgres so we can continue to keep data private to a client, but still perform cross client queries for some use cases.
eg. If we have a source Sales table which does not have a client code in.
We want to present sales data analytics to clients via a Quicksight Portal.
And also a Sales manager that wants to see sales analytics across all Clients.

Related

What is the correct to prevent doublon stored in db used by a realtime app

I currently have a server that watch some events on the ethereum blockchain. When some events are triggered there, as my server is subscribed to them it will pick them up, do some stuff and fill my database accordingly.
That being said, for scalability purpose, let say I would like to now have several instance of my server. So now, I have 3 servers that watches the ethereum blockchain for events and fill my db.
What is the proper/standard way to tackle the fact that all my server will be pushing the same data on my db ?

Recommendations to store streaming events

We're evaluating possible approaches to persist streaming events(user click events in a web browser from many different users) so that it allows us to build custom user dashboards to later analyse those click events. We're planning to use Kafka to serve as the intermediate layer to ingest the vast amounts of streaming data coming from various user browsers. However I am curious to know whether Kafka can also serve as a persistent database to store these events so that we can later build the dashboarding application and have it query the events via some backend web APIs that we design.
Essentially, this is what we're thinking as of now:
Dashboarding frontend --- API ---> backend service ----queries ----> Kafka(stores user click events)
This article mentions that Kafka can be used as a persistent DB that apps can query but it cannot "replace" the traditional databases. I can imagine the huge cost overhead if Kafka is used as a persistent DB but then Kafka tiered storage might be a possible solution to bring the storage costs down?
Overall, to be able to design a custom dashboard to query the ingested event streams, is it advisable to use Kafka as a DB replacement or should we consider integrating Kafka with a traditional SQL/noSQL database or some other type of database? Any recommendations on which persistent DBs go well with Kafka for these types of use-cases?
Yes and no.
RocksDB (or a custom state-store) will allow you to "query" Kafka data via KSQL or Kafka Streams; you wouldn't have a direct API replacement against Kafka directly. There is also a recent podcast from Confluent discussing GraphQL queries against Kafka and/or a database layer.
Regarding analysis, it would be far better to use tools like Elasticsearch (with Kibana), Apache Pinot, or Druid (along with Apache SuperSet) for such click-stream analytics and dashboarding, and using Kafka as a channel to get data into those locations.
In general, your approach of frontend -> backend -> kafka -> db is good. Assuming the throughput is at a point that warrants bringing in kafka.
is it advisable to use Kafka as a DB replacement
No
should we consider integrating Kafka with a traditional SQL/noSQL database or some other type of database?
Yes
Any recommendations on which persistent DBs go well with Kafka for these types of use-cases?
This depends more on the context, constraints, and requirements of your work place. Expected throughput? What DBs already exist? What programming language is preferred?
You can run olap style dashboard and analytics queries on oltp databases such as postgres. Many teams run their analytics on the read replicas.
The blue chip DBs for this would be elastic search, redash, or big query. The rocket ships are snowflake and clickhouse.
Another option is to allow the data science team [if there is a data science team] to ingest the kafka stream directly into spark or some other system and do their processing directly on the hose to provide the dashboards required

Query message store of mirth connect

Can I use mirth connect to store millions of HL7v2 messages (pipe delimited) and query them programmatically by our third party software application at a later point of time?
What's the best way to do that? Is mirth's REST API capable to query its message store efficently?
Unfortunatly I need a running mirth connect instance to browse the REST API documentation according to the manual at page 368. (If it wouldn't require to have a running instance of mirth to browse the documentation of the REST API I wouldn't have asked that question. Is there a mirth connect instance available on the internet to play with? Or would somebody be so kind to post the relevant REST API documentation for that question?)
So far, those are the scenarios I came up yet:
Mirth is integration engine, and its strength is processing messages. Browsing historical messages can be at times difficult or slow, depending on the storage settings for the channel and whether or not you take care to pull additional information out during processing to store in "custom metadata" fields. The custom metadata fields are not indexed by default, but you can add your own (mirth supports several back-end databases, including postgres, mysql, oracle, and mssql.) Searching the message content basically involves doing a full-text search and scanning. Filter options to reduce scan time, apart from the custom metadata you create, are mostly related to the message properties (datetime received, status, etc..) and not the content.
So, I would not recommend it for the use-case you are suggesting.
However, Mirth could definitely be used to convert your messages (batched from files or live) to xml which could be put in a database designed to handle and query large volumes of xml documents. I assume when you say HL7 you mean the ER7 (pipe delimited) format of HL7v2. Mirth automatically does the conversion to xml for those types of messages as they are handled as xml during processing. You could easily create a new parent node that holds both the converted xml and the original message string as children.
If the database you choose has a JDBC driver, Java SDK, or HTTP/REST API, mirth can likely directly insert the converted messages for you as it processes them.
There are two misconceptions here:
HL7v2 message is triggered by the real-world event, called the trigger event, on the placer (sender) side. It expects some activity to happen on the filler (receiver) side by either confirming the message, replying with the query response, etc. I.e., HL7v2 supports data flow among systems.
Mirth Connect is HL7 interface engine aimed at transforming incoming feeds in one format (e.g., HL7v2 in ER7 format) into outgoing feeds in another format (which could be another HL7v2, or XML, or database, etc.). It does not store anything except a configured portion of messages for audit purposes.
Now, to implement a solution you outlined, Mirth Connect or any other transformation mechanism has to implement two flows: receive, convert if needed and store incoming messages; provide an interface to query those messages.
This is obviously can be done with Mirth Connect but your initial question if Mirth is capable in storing millions of records is incorrect. In fact it's recommended to keep as less messages as possible to speed up Mirth processing (each processed message is stored in the Mirth internal database several times depending on configuration). Thus, all transformed messages are going into the external public or private message storage exactly as shown on your diagrams.

How do I limit a Kafka client to only access one customer's data?

I'm evaluating Apache Kafka for publishing some event streams (and commands) between my services running on machines.
However, most of those machines are owned by customers, on their premises, and connected to their networks.
I don't want a machine owned by one customer to have access to another customer's data.
I see that Kafka has an access control module, which looks like it lets you restrict a client's access based on topic.
So, I could create a topic per customer and restrict each customer to just their own topic. This seems like a bad idea that I could regret in the future, because I've seen things that recommend restricting the number of Kafka topics to the 1000s at most.
Another design is to create a partition per customer. However, I don't see a way to restrict access if I do that.
Is there a way out of this quandary?

convert always connected to occasionally connected application

I have an existing client-server 3-tier application with the following stack :
Smart-Client (Win-Forms)
IIS/ASP.NET
Sql server
Some of the data is stored in Entity–attribute–value (EAV) model.
All primary keys are integer identity columns.
Database operations are mostly performed using Stored procedures.
I am tasked with converting this application into an occasionally connected application (OCA)
There should be no issues with installation and resources limitation on the clients.
This is the first such project for me.
I have done some reading about
ms sync framework
Enterprise library / occasionally connected smart clients
SQL server replication
In order preserve existing code and limiting change impact, I am considering installing the the 3-tier application on each client , using sync framework to handle synchronization on the WS to handle synchronization. Also having one master server to which synchronizations will refer.
Does this solution look feasible?
Are there any other resources regarding converting an on always Connected 3-tier application to an occasionally connected application ?
Thank you .
should be feasible. not much change in your app. you just have to install a local database on your clients.
however, your're using identity columns. unless you partitioned your identity values (client 1 is 1-1000, client 2 is 1001 - 2000, etc...) you will duplicate IDs when you upload them.
have a look at this: Database Sync:SQL Server and SQL Express N-Tier with WCF