Kafka has the concept of streams and tables. Streams represent the happening of events whereas tables represent the state.
Is there a corresponding entity on the Azure resource? Looked at Event Hubs, Event Grids, Servicebus queues and topics. Can't seem to find an equivalent?
As far as I know there is no equivalent.
I'd say the service that comes closest to Kafka is Azure Event Hubs especially in terms of real-time event processing which Azure Service Bus is not dedicated for. But still it does not provide the same features, e.g. there are no materialized views (or least I could not find anything about it until now) which relate to Kafka Tables. There is an article relating to Kafka integration with Azure Event Hubs where they also explain the equivalents as well as unsupported features:
https://learn.microsoft.com/en-us/azure/event-hubs/event-hubs-for-kafka-ecosystem-overview
Whenever I stumble across an article of Microsoft or some other blogs when they talk about event streaming (or event sourcing) and materialized views, for the materialized views part they refer to an Azure Storage service such as Table Storage, CosmosDB or similar in combination with Azure Event Hubs or even Service Bus.
So I am afraid there is no comparable cloud managed service on Azure as a direct substitute to Kafka at the moment. Would be glad to see another answer that proves me wrong as I would also be interested in that myself.
During my search I came across some interesting GitHub projects that try to address similar problems on the Azure and .Net stack:
https://github.com/Lokad/AzureEventStore
https://github.com/AzureCosmosDB/solution-accelerator
https://github.com/dannormington/azure-cqrs
Related
I want to synchronize data among a set of REST API servers(Spring Boot based API cluster) periodically. Any instance in the cluster should be able to broadcast new information to all others.
I don't want to use a DB here. I am trying to find a lightweight library that can be used inside the API for this purpose. Is it possible to use Atomoix/Hazelcast/ZooKeeper for this purpose? If so, it will be really helpful if someone can post a sample code - if possible.
My thanks in advance.
In Hazelcast you can do it through WAN replication.
It is an enterprise feature you have to buy a license.
Hazelcast can be used for this use-case. Each of the REST instances will create an embedded Hazelcast member within its JVM. Hazelcast members then discover each other and form the cluster. Your REST apps will use the IMap or ReplicatedMap service - a distributed key-value store (IMap can store more data, ReplicatedMap is faster). Once you write a data to the IMap all other instances see it right away.
See the code sample here: https://docs.hazelcast.com/hazelcast/latest/getting-started/get-started-java.html#complete-code-samples
This feature and the Spring integration are open-source.
I need a solution preferably something inbuilt (rather than creating my own application) which would help management search through multiple/all topics in Kafka. We are using Confluent Platform. Basically user should be able to search a keyword in a UI and it should search current log of multiple/all Kafka topics and return the data. All the topics in our environment use json to communicate.
So this search would enable us to track flow for example, multiple microservices send data from one system to another system and this flow can be tracked via a correlation id which is present in all the jsons. So if someone searchers this correlation id he should be able to see the messages involved in the flow. This search would have more use cases later on.
We need a solution which would have minimal coding involved. We would prefer to use a UI like Kibana.
On basic reading I suspect below solutions but not really sure as I am new to Confluent (used open-source Apache Kafka earlier):
Sol 1: use ksqldb. (need more help on how to use it)
Sol 2: Stream all topics data using Kafka Connect to Elastic Search by using inbuilt plugin and use Kibana on top of Elastic.
Kindly help to find the best case alternative.
You could use Elastic, sure.
You could also use Splunk, though.
There is also the pdk tool offered by Pilosa that creates a distributed index over Kafka events. (no affiliation)
Another option would be distributed tracing using interceptors between clients, not "on all topics", which sounds like what you actually need
I need to connect Salesforce to an external database we have, and constantly keep both the database and salesforce updated in as close to real time as we can get. I have tired Google searching possible solutions, but nearly all of them have been outdated by over a year. Any ideas?
Thank You!
Depending on your exact scenario it is quite difficult to give you a proper answer.
However off the top of my head I would suggest two Salesforce products.
Salesforce Connect
https://www.salesforce.com/products/platform/products/salesforce-connect/
Salesforce Connect allows you to connect to various data sources and turn the tables / objects of that data source into a SObject. For example MySQL, Microsoft SQL Server, Oracle etc. There are limitations and thus it would be better to talk to a Certified Architect about such an implementation.
Heroku Connect
https://www.heroku.com/connect
Heroku Connect allows you to connect a Heroku data source with a Salesforce Object. The sync is not immediate but there are quite a few customisations inside the product to make the sync as "live" as possible. There are limitations and thus it would be better to talk to a Certified Architect about such an implementation.
Salesforce Connect has limitations.. It's good for presenting data via the interface, but if you need to act on the data and report on the data it might not be the best bet.
For close to real time hand coded sync, look at the streaming API, or using Salesforce Platform Events.
If you want to use an ETL tool, my organization has had decent luck with DBAmp, which is a Sql add on product and fairly inexpensive as compared to a lot of ETL tools ($1625 annually.) http://www.forceamp.com/ We're able to replicate the entire SF database offline in SQL with DBAMP, push changes to the offline Sql copy and upsert changes. It's also a good backup solution via offline full data copy. We got very good support from them as well when we encountered challenges.
Hope this helps.
Not sure if you are syncing one object or multiple objects but there are a few options that you have.
You can try the salesforce provided features Salesforce Connect which allows you to view and update data from your external source In salesforce but there are limitations with reporting and other considerations you should consider.
If you make use of Heroku, Heroku Connect is your best bet
You can also use a middleware ESB solutions like MuleSoft which can orchestrate keeping data in sync across multiple data sources and do batch loads, but depending on how often changes you want to keep an eye out for api limits for inbound calls to salesforce.
You can roll your own solution where you can use Outbound Messages in workflow (or triggers that initiates an apex class that calls out, but that is more cumbersome and you have to do custom error handling and retry logic which you get for free using outbound messages) to send changes from salesforce to your homegrown service that writes to you database and have you homegrown solution write back to salesforce using the soap or rest api. That would probably take you some time to build. You would also still need to be aware of API limits depending on how many updates are made on the non salesforce side.
You crate a Canvas App which displays data from your DB in Salesforce as a Tab and hook it up via SSO so users are auto logged in. But again there would not be reporting, or any salesforce features that you can take advantage of.
But I really think that you should spend some time to determine what system is your source of truth because that would determine how the data should be synced. You should also investigate if you really need the sync to be realtime or near realtime, or if you can manage with something like an hourly true up on the system that is not the source of truth.
We would like to use OrientDB Graph in an Azure environment. Does anybody has experience using it? We also would like to know if high availability from OrientDB is required under Azure cloud? Azure already offers high availability for Azure storage, Azure Drive and SQL. I understand that they have replications and load balancing built in.
This is super important because we prefer not to get into the business of replications and infrastructure management.
Thanks
So you can spin up 2 or more machines and install OrientDB on them, then configure them together as a distributed cluster. However I haven't been able to find any way that is simpler, easier to do. I am interested in this topic too.
Azure does have features such as geo-replication, which is protects your data against a major data-center incident but doesn't provide any performance benefit and will not make it highly available.
Although pretty reliable, occasionally Microsoft will reboot servers for updates, so to protect against downtime you can use affinity groups so that, of your 2 or more servers, one will always be online. This however does need to be used in conjunction with database replication and ideally load balancing.
It's also worth noting that OrientDB recommends clusters have an odd number of servers as this can prevent conflicts when synchronising data after a communication issue between the servers.
I am using it in amazon and I had to create a java project to monitor http requests inserts and queries. The queries are very fast but takes longer inserting data .
I recommend this type of graph database mode to decrease the time of the queries. Also if you have empty fields OrientDB manages very well compared to other databases .
If you need help with the java project can response to this post and I´ll help u.
I hope it helps. Good luck.
I am developing a c# web application that will be hosted in Windows Azure and use Table Data Storage (TDS).
I want to architect my application such that I can also (as an option) deploy the application to a traditional IIS server with some other NoSql back-end. Basically, I want to give my customers the option to either pay me in the software as a service model, OR purchase a license of my application that they can install on a (non-azure) production server of their own.
How can I best architect my data layer and middle tier to achieve both goals?
I will likely need a Windows Azure Worker Role and an Azure Queue. How complicated is to replicate these? Can I substitue a custom Windows Service and some other queuing technology?
How I can the entities in my data model be written such that I can deploy to Azure TDS or some other storage when not deploying to Azure? Would MongoDB or similar be useful for this?
Surely there is a way to architect for Azure without being married to it.
I will likely need a Windows Azure Worker Role and an Azure Queue. How complicated is to replicate these? Can I substitue a custom Windows Service and some other queuing technology?
Yes - a Windows service with some other queuing technology would fit this reasonably well - and worker roles have a main/Run loop which is easy to use within a Windows Service.
How I can the entities in my data model be written such that I can deploy to Azure TDS or some other storage when not deploying to Azure? Would MongoDB or similar be useful for this?
NoSql is a general term encapsulating lots of different technologies. I think Azure TDS currently belongs to the Key-Value store family of NoSql, while MongoDB is more of a document database offering much richer functionality than TDS - see http://en.wikipedia.org/wiki/NoSQL_(concept). For mimmicking Azure TDS I think maybe a variant of something like Redis might work (although I believe Redis itself has wider functionality then TDS currently)
In general, it depends on the shape of your data, but I suspect if you can fit it in Azure TDS, then you'll be able to fit it into your choice of other storage too.
Surely there is a way to architect for Azure without being married to it.
Yes - as you've suggested in your question, you can architect your app so it can work on other technologies instead. In fact, this is quite a similar challenge to the traditional SQL data abstraction methods. However, I think there are a few places where you'll find TDS pushing you in certain
directions which won't fit well with other stores - e.g. Azure pushes you much more towards data replication; has very specific rules on keys; offers high performance using very specific mechanisms; and offers limited transaction integrity in very specific situations. These factors may mean that you do have to indeed change some middle tier layers as well as some data layers in order to get the most out of your app in both its Azure and non-Azure variations.
One other thought - It might be easier to offer your clients a multitenant SaaS version on Azure, and a singletenant version hosted on Azure - but this does depend on the clients!
I found a viable solution. I found that I can use EF Code First with SQL Server or SQL CE if I design my entities with the same PartitionKey & RowKey compound key structure that Azure Table Storage requires.
With a little help from Lokad Cloud (http://code.google.com/p/lokad-cloud/) to perform the interaction with Azure Table Storage, I was able to craft a common DataContext that provides crud operations against either EF's DbContext OR Lokad's TableStorageProvider.
I even found a nice way to manage relationships between entities and lazy-load them properly.
The solution is a bit complex and needs more testing. I will blog about it and post the link here when ready.