I have found four sets that can be ingested directly into Memgraph at Awesome Streams site. I've also found a tutorial How to build a Spotify Recommendation Engine using Kafka and Memgraph.
Is there a public stream of this dateset? I know that I can download it, and I know that there is an already app but I'd like for a public stream so that I can showcase this in my school without need for bringing my laptop.
Memgraph currently offers four data streams at Awesome Streams. These are the same streams that can be found at https://github.com/memgraph/data-streams. Spotify data stream is mentioned at https://github.com/memgraph/spotify-song-recommender.
You can open up the issue at https://github.com/memgraph/data-streams and ask for a Spotify stream to be included.
Related
I need to create a solution that receives events from web/desktop application that runs on kiosks. There are hundreds of kiosks spread across the country and each one generate time to time automatic events and events when something happens.
Despite this application is a locked desktop application it is built in Angular v8. I mean, it runs in a webview.
I was researching for scalable but reliable solutions and found Apache Kafka seems to be a great solution. I know there are clients for NodeJS but couldn't find any option for Angular. Angular runs on browser, for this reason, it must communicate to backend through HTTP/S.
In the end, I realized the best way to send events from Angular is to create a API that just gets message from a HTTP/S endpoint and publishes to Kafka topic. Or, is there any adapter for Kafka that exposes topics as REST?
I suppose this approach is way faster than store message in database. Is this statement correct?
Thanks in advance.
this approach is way faster than store message in database. Is this statement correct?
It can be slower. Kafka is asynchronous, so don't expect to get a response in the same time-period you could perform a database read/write. (Again, would require some API, and also, largely depends on the database used)
is there any adapter for Kafka that exposes topics as REST?
Yes, the Confluent REST Proxy is an Apache2 licensed product.
There is also a project divolte/divolte-collector for collecting click-data and other browser-driven events.
Otherwise, as you've discovered, create your own API in any language you are comfortable with, and have it use a Kafka producer client.
I am new to Apache Kafka and its stream services related to API's, but was wondering if there was any formal documentation on where to obtain the initial raw data required for ingestion?
In essence, I want to try my hand at building a rudimentary crypto trading bot, but was under the impression that http APIs may have more latency than APIs that integrate with Kafka Streams. For example, I know RapidAPI has a library of http APIs that can be accessed that would help pull data, but was unsure if there was something similar if I wanted the data to be ingested through Kafka Streams. I guess I am under the impression that the two data sources will not be similar and will be different in some way, but am also unsure if this is not the case.
I tried digging around on Google, but it's not very clear on what APIs or source data is taken for Kafka Streams, or if they are the same/similar just handled differently.
If you have any insight or documentation that would be greatly appreciated. Also feel free to let me know if my understanding is completely false.
any formal documentation on where to obtain the initial raw data required for ingestion?
Kafka accepts binary data. You can feed in serialized data from anywhere (although, you are restricted by (configurable) message size limits).
APIs that integrate with Kafka Streams
Kafka Streams is an intra-cluster library, it doesn't integrate with anything but Kafka.
If you want to periodically, poll/fetch an HTTP/1 API, then you would use a regular HTTP client, and a Kafka Producer.
Probably a similar answer with streaming HTTP/2 or websocket, although, still not able to use Kafka Streams, and you'd have to deal with batching records into a Kafka Producer request
You instead should look for Kafka Connect projects on the web that operate with HTTP, or opt for something like Apache NiFi as a broader project with lots of different "processors" like GetHTTP and ProduceKafka.
Once the data is in Kafka, you are welcome to use Kafka Streams/KSQL to do some processing
In my understanding, when I want to send a movie (4GB) to a Kafka broker, one producer will send that 4GB byte of a video file (serialized it) and send it to a kafka broker and many consumers who want to see that movie will consume that movie file.
I heard Netflix uses Kafka to send and watch movies. I am curious how they use producer, broker, and consumer. I'm using Netflix, and it's really fast. I want to know how they use Kafka.(especially how they use producers and consumers)
And as far as I know, when sending a video file, you need to encode it, and serialize it to send the data. (maybe encoding is serializing in this case?) Did I understand correctly? If I am missing something, could you give me some tips and guidance?
Netflix uses Kafka as part of its centralized data lineage solution. It is not using Kafka to encode, stream video contents. You can read more about how Kafka is being used here.
Now to answer your question on why its video streaming services are so fast. You'll need to understand how Netflix leverages aws resources like ec2, s3 and others to create a highly scalable, fault-tolerant microservice architecture.
On top of this Netflix works with ISPs to localize contents using a program called Netflix Open Connect. This allows them to cache the content locally which minimizes latency and saves on compute.
Kafka is a "Streaming Platform" but it's intended for streaming data and it's not designed to stream videos or audio.
While Netflix is using Kafka, it's not to stream videos to users but instead to process events in their backend, see their technology blog. Note that I'm not a Netflix employee nor I have any insider knowledge, it's just based on the information they disclosed publicly on their blog and at conferences.
That said, it's still possible to send a video file using a producer and receive it with a consumer but I don't think it's what you had in mind.
I am using Google Cloud Platform. I need to push messages from different source to a topic, but it should be done using a single asynchronous call.
Batching Cloud Pub/Sub messages can be done and implemented using the Client Libraries. Have a look at the official GCP documentation here for examples in different programming languages.
I am planning to track all the user activity on my website and kafka seems to be a reliable solution. I am unable to resolve how to send generated events to kafka i.e. how to make my website as a kafka producer.
Which language is your website written in?
In Java/Scala the solution is to import the kafka producer dependencies, create a producer, create messages and send them.
I hope this example will help:
https://cwiki.apache.org/confluence/display/KAFKA/0.8.0+Producer+Example