Coremetrics files / Unstructure data in Cloud - google-cloud-storage

I need to load Coremetrics files / Unstructured data in Google Cloud.
What are options I have to do that? Manual / automated.
What are options to read from unstructured data
Thanks you input

Coremetrics provides three good ways to do this (depending on your need):
Export tool, LIVEmail, and the DDX API which can also export data. The documentation for each is in the Coremetrics tool under "Help".

Related

how to search text in json file that Google vision api created from pdf

Is there any way to search text in json files that Google vision api created from pdf.
searching of text should be happen over Google cloud storage only
Google Cloud Storage is an Object based storage solution that does not provide processing features. In order to perform any process job over the Cloud Storage data you would need a computing/processing solution, and I’d opt for a serverless option such as Cloud Functions.
I’ve found at the Cloud Functions Docs a sample application that integrates several APIs with Cloud Functions and Cloud Storage, I think you can use it as a guideline to develop your own setup.
Once you have the mentioned setup you could apply a regex implementation to search for the desired data, how to implement it will depend on the runtime, libraries and technologies that you choose to use.

Is there any way to call Bing-ads api through a pipeline and load the data into Bigquery through Google Data Fusion?

I'm creating a pipeline in Google Data Fusion that allows me to export my bing-ads data into Bigquery using my bing-ads developer token. I couldn't find any data sources that should be added to my pipeline in data fusion. Is fetching data from API calls even supported on Google Data Fusion and if it is, how can it be done?
HTTP based sources for Cloud Data Fusion are currently in development and will be released by Q3. Could you elaborate on your use case a little more, so we can make sure that your requirements will be covered by those plugins? For example, are you looking to build a batch or real-time pipeline?
In the meantime, you have the following two, more immediate options/workarounds:
If you are ok with storing the data in a staging area in GCS before loading it into BigQuery, you can use the HTTPToHDFS plugin that is available in the Hub. Use a path that starts with gs:///path/to/file
Alternatively, we also welcome contributions, so you can also build the plugin using the Cloud Data Fusion APIs. We are happy to guide you, and can point you to documentation and samples.

MongoDB + Google Big Query - Normalize Data and Import to BQ

I've done quite a bit of searching, but haven't been able to find anything within this community that fits my problem.
I have a MongoDB collection that I would like to normalize and upload to Google Big Query. Unfortunately, I don't even know where to start with this project.
What would be the best approach to normalize the data? From there, what is recommended when it comes to loading that data to BQ?
I realize I'm not giving much detail here... but any help would be appreciated. Please let me know if I can provide any additional information.
If you're using python, easy way is to read collection chunky and use pandas' to_gbq method. Easy and quite fast to implement. But better to get more details.
Additionally to the answer provided by SirJ, you have multiple options to load data to BigQuery, including loading the data to Cloud Storage, local machine, Dataflow any more as mentioned here. Cloud Storage supports data in multiple formats such as CSV, JSON, Avro, Parquet and more. You also have various options to load data using Web UI, Command Line, API or using the Client Libraries which support C#, GO, Java, Node.JS, PHP, Python and Ruby.

Loading Data into marklogic from MySql

How to load data into Marklogic from MySql database.
As well as how to create a document database & build search application on top of it ? (PDF's would be the source for documents database)
-Thanks & Regards
Swapneel
there are two main options for loading data from MySQL - you can either export flat csv files and import the rows as documents using MLCP or by connecting directly with MLSAM. You'll have to look and see which best suits your situation.
In terms of design, I'd recommend looking at the MarkLogic developer website and going through the tutorials; there's a good overview of data modeling.
You will probably want to look at the CPF documentation for information about PDF conversion as well.
Hope that's been useful, Ed

Creating Metadata Catalog in Marklogic

I am trying to combined data from multiple sources like RDBMS, xml files, web services using Marklogic. For this as I see from MarkLogic documentation on Metadata Catalog (https://www.marklogic.com/solutions/metadata-catalog/), Data Virtualization (https://www.marklogic.com/solutions/data-virtualization/) and Data Unification it is very well possible. But I am not able to get hold of any documentation describing how exactly to go about it or which tools to use to achieve this.
Looking for some pointers.
As the second image in the data-virtualization link shows, you need to ingest all data into MarkLogic databases. MarkLogic can then be put in between to become the single entry point for end user applications that need access to that data.
The first link describes the capabilities of MarkLogic to hold all kinds of data. It partly does so by storing them as-is, partly by extracting text and metadata for searching, partly by conversion (if you needs go beyond what the original format allows).
MarkLogic provides the general purpose MarkLogic Content Pump (MLCP) tool for this purpose. It allows ingesting zipped or unzipped files, and applying transformations if necessary. If you need to retrieve your data from a different database, you might need a bit more work to get that out. http://developer.marklogic.com holds tutorials, blogs, and tools that should help you get going. Searching the MarkLogic Mailing List through http://marklogic.markmail.org/ can provide answers as well.
HTH!
Combining a lot of data is a very broad topic. Can you describe a couple types of data you'd like to integrate, and what services or queries you would like to build on that data?