Can kafka be used to obtain GoogleAnalytics data and processed through spark streaming? - apache-kafka

Can you please anyone tried to integrate google analytics with kafka ..And the processed those data in spark streaming.
Thanks!

A little late since this question was asked but...
I created a python application that demonstrates how to pull information from Google Analytics and push those metrics to a Kafka topic. Soon I should have code that demonstrates analyzing that data.
https://github.com/admintome/analytics-intake
Hopefully, that will demonstrate how to accomplish what you asked for.

There is no API available for scala to access the google analytic.

Related

Streaming data into big query using Dart/Flutter

Can anyone point me to any documentation on streaming data into big query using Dart/Flutter?
I'm pretty new to all this and cannot for the life of me find any Flutter specific documentation or precedence.
Can anyone help?
Dart/Flutter isn't one of the languages which has a directly maintained client library, but it looks like there's a discovery-based library. The API method endpoint you need for streaming data is bigquery.tabledata.insertAll, which is surfaced in the dart library here it seems:
https://pub.dev/documentation/googleapis/latest/googleapis.bigquery.v2/TabledataResourceApi/insertAll.html
None of the docs in the BigQuery public documentation will contain Dart/Flutter code examples, but you can get more background on streaming data in https://cloud.google.com/bigquery/streaming-data-into-bigquery
Flutter --> Google Cloud Function --> BigQuery
Your best approach by far will be to use a Google Cloud Function to write to BigQuery. Architecturally, this is the right approach, though I could argue that putting a stream platform (like Google Pub/Sub) in the middle has some additional benefits. In your situation, however, I'd recommend going straight from Cloud Functions to BigQuery for simplicity since you're already going to have a learning curve and probably don't need the stream layer at this point.
There's good documentation for integrating Cloud Functions into Flutter, and it's quite intuitive to use.
Here are the official Flutter docs that talk about this integration path:
https://firebase.flutter.dev/docs/functions/overview/
Here's a good walkthrough on how to actually do it: https://rominirani.com/tutorial-flutter-app-powered-by-google-cloud-functions-3eab0df5f957
On the BigQuery side:
There's a very detailed walkthrough here on writing to BigQuery from Cloud Functions: https://cloud.google.com/solutions/streaming-data-from-cloud-storage-into-bigquery-using-cloud-functions
You can find more guides as well if you search Google.

MessageHub - api for creating topics?

In the bluemix web console, it is possible to create topics. However, is it possible to create topics using a REST API? I couldn't see documentation online - maybe I missed it.
#SHC no worries, it is easy to miss. Try here for a brief mention and link to the Swagger document for the topic administration API: https://console.ng.bluemix.net/docs/services/MessageHub/messagehub010.html#messagehub037
The docs are currently been restructured to improve discoverability. Feel free to come back if you have further questions.

Kafka entries to DynamoDB

I want all records coming in Kafka topic to be inserted in DynamoDB. Is there any open source plugin available which can do the same. Or is there any other better way that I can do it within DynamoDB itself.
Pls Suggest
Lokesh Narayan
Storing Kafka messages in DynamoDB is a great use case for Kafka Connect. Unfortunately I don't know of any off-the-shelf sink connectors for DynamoDB. You can see a list here. For now, you'll need to either build your own sink connector (and hopefully open source it!) or build a custom consumer that writes to DynamoDB.
However, I found this kafka-connect-dynamodb. Though it is currently unmaintained, it worked for me.

Is there a Google Dataflow MongoDB Source/Sink?

I know Google Dataflow only officially supports as I/O for a Dataflow a file in Google Cloud Storage, BigQuery, Avro files or Pub/Sub out of the box.
But as it has an API for Custom Source and Sink I was wondering, is there some Pipeline I/O implementation for MongoDB?
Right now I will have to either migrate my data to BigQuery or write the whole Pipeline I/O implementation before even being able to know if Google Dataflow is a viable solution to my current problems.
I tried googling and looking at the current SDK issues and didn't see anything related. I even started to wonder if I missed something very basic from Google Dataflow concept and docs that completely invalidades this initial idea to use MongoDB as a data source.
Recently a MongoDB connector was added to Apache Beam (incubating). Please see MongoDBIO.

How do I ingest corpus data into IBM Watson?

I am building a application where I am trying to use IBM Watson question & answer API.
Currently I see only corpus for Healthcare and Travel, but I would like to ingest Custom dataset suiting my needs. Can anyone please point me to right direction or exact API which does that or IBM already built explorer which I can use to upload the data files directly.
Thanks for the help
At this time you can not ingest corpus data into IBM Watson. That is coming in the future.