Is there a Google Dataflow MongoDB Source/Sink? - mongodb

I know Google Dataflow only officially supports as I/O for a Dataflow a file in Google Cloud Storage, BigQuery, Avro files or Pub/Sub out of the box.
But as it has an API for Custom Source and Sink I was wondering, is there some Pipeline I/O implementation for MongoDB?
Right now I will have to either migrate my data to BigQuery or write the whole Pipeline I/O implementation before even being able to know if Google Dataflow is a viable solution to my current problems.
I tried googling and looking at the current SDK issues and didn't see anything related. I even started to wonder if I missed something very basic from Google Dataflow concept and docs that completely invalidades this initial idea to use MongoDB as a data source.

Recently a MongoDB connector was added to Apache Beam (incubating). Please see MongoDBIO.

Related

Streaming data into big query using Dart/Flutter

Can anyone point me to any documentation on streaming data into big query using Dart/Flutter?
I'm pretty new to all this and cannot for the life of me find any Flutter specific documentation or precedence.
Can anyone help?
Dart/Flutter isn't one of the languages which has a directly maintained client library, but it looks like there's a discovery-based library. The API method endpoint you need for streaming data is bigquery.tabledata.insertAll, which is surfaced in the dart library here it seems:
https://pub.dev/documentation/googleapis/latest/googleapis.bigquery.v2/TabledataResourceApi/insertAll.html
None of the docs in the BigQuery public documentation will contain Dart/Flutter code examples, but you can get more background on streaming data in https://cloud.google.com/bigquery/streaming-data-into-bigquery
Flutter --> Google Cloud Function --> BigQuery
Your best approach by far will be to use a Google Cloud Function to write to BigQuery. Architecturally, this is the right approach, though I could argue that putting a stream platform (like Google Pub/Sub) in the middle has some additional benefits. In your situation, however, I'd recommend going straight from Cloud Functions to BigQuery for simplicity since you're already going to have a learning curve and probably don't need the stream layer at this point.
There's good documentation for integrating Cloud Functions into Flutter, and it's quite intuitive to use.
Here are the official Flutter docs that talk about this integration path:
https://firebase.flutter.dev/docs/functions/overview/
Here's a good walkthrough on how to actually do it: https://rominirani.com/tutorial-flutter-app-powered-by-google-cloud-functions-3eab0df5f957
On the BigQuery side:
There's a very detailed walkthrough here on writing to BigQuery from Cloud Functions: https://cloud.google.com/solutions/streaming-data-from-cloud-storage-into-bigquery-using-cloud-functions
You can find more guides as well if you search Google.

Is there a connectivity possible between google firebase and AWS MongoDB

I have data getting stored in Google firebase (i.e. output of google vision API). I need the same data to get stored in MongoDB which is running on AWS. Is there a connectivity possible between Google cloud and AWS for data migration?
There is no out of the box solution for what you're trying to accomplish. Currently Firestore supports exporting it's data as documented here though the format of the export is probably not something MongoDB could import right away. Even if the format was compatible you would need some kind of data processing pipeline to handle the flow from side to side.
Depending on how you're handling the ingestion of Vision API results you might be able to include code to also send that data to MongoDB. If that's not the case you might need to design a custom solution for this particular use case.

How setup Google Ads as source of Cloud Data Fusion pipeline?

I'm trying to ingest data of my Google Ads account into a Cloud Data Fusion pipeline, but I just see that are only available 12 sources (BigQuery, Amazon S3, File, Excel, Kafka Consumer, etc)
Does anybody know if there are a way to connect directly via API? Or need I a paying solution as extractor of the data?
Many thanks!
Are you looking to ingest data from Analytics 360? https://marketingplatform.google.com/about/analytics-360/
Cloud Data Fusion does not have this connector but we will have this available in the future.
An update, now you can go to HUB on the top right corner and choose your data source such as Google ads or Google analyticsenter image description here

Is there any way to call Bing-ads api through a pipeline and load the data into Bigquery through Google Data Fusion?

I'm creating a pipeline in Google Data Fusion that allows me to export my bing-ads data into Bigquery using my bing-ads developer token. I couldn't find any data sources that should be added to my pipeline in data fusion. Is fetching data from API calls even supported on Google Data Fusion and if it is, how can it be done?
HTTP based sources for Cloud Data Fusion are currently in development and will be released by Q3. Could you elaborate on your use case a little more, so we can make sure that your requirements will be covered by those plugins? For example, are you looking to build a batch or real-time pipeline?
In the meantime, you have the following two, more immediate options/workarounds:
If you are ok with storing the data in a staging area in GCS before loading it into BigQuery, you can use the HTTPToHDFS plugin that is available in the Hub. Use a path that starts with gs:///path/to/file
Alternatively, we also welcome contributions, so you can also build the plugin using the Cloud Data Fusion APIs. We are happy to guide you, and can point you to documentation and samples.

MongoDB + Google Big Query - Normalize Data and Import to BQ

I've done quite a bit of searching, but haven't been able to find anything within this community that fits my problem.
I have a MongoDB collection that I would like to normalize and upload to Google Big Query. Unfortunately, I don't even know where to start with this project.
What would be the best approach to normalize the data? From there, what is recommended when it comes to loading that data to BQ?
I realize I'm not giving much detail here... but any help would be appreciated. Please let me know if I can provide any additional information.
If you're using python, easy way is to read collection chunky and use pandas' to_gbq method. Easy and quite fast to implement. But better to get more details.
Additionally to the answer provided by SirJ, you have multiple options to load data to BigQuery, including loading the data to Cloud Storage, local machine, Dataflow any more as mentioned here. Cloud Storage supports data in multiple formats such as CSV, JSON, Avro, Parquet and more. You also have various options to load data using Web UI, Command Line, API or using the Client Libraries which support C#, GO, Java, Node.JS, PHP, Python and Ruby.