Loading Data into marklogic from MySql - mysql-workbench

How to load data into Marklogic from MySql database.
As well as how to create a document database & build search application on top of it ? (PDF's would be the source for documents database)
-Thanks & Regards
Swapneel

there are two main options for loading data from MySQL - you can either export flat csv files and import the rows as documents using MLCP or by connecting directly with MLSAM. You'll have to look and see which best suits your situation.
In terms of design, I'd recommend looking at the MarkLogic developer website and going through the tutorials; there's a good overview of data modeling.
You will probably want to look at the CPF documentation for information about PDF conversion as well.
Hope that's been useful, Ed

Related

MongoDB + Google Big Query - Normalize Data and Import to BQ

I've done quite a bit of searching, but haven't been able to find anything within this community that fits my problem.
I have a MongoDB collection that I would like to normalize and upload to Google Big Query. Unfortunately, I don't even know where to start with this project.
What would be the best approach to normalize the data? From there, what is recommended when it comes to loading that data to BQ?
I realize I'm not giving much detail here... but any help would be appreciated. Please let me know if I can provide any additional information.
If you're using python, easy way is to read collection chunky and use pandas' to_gbq method. Easy and quite fast to implement. But better to get more details.
Additionally to the answer provided by SirJ, you have multiple options to load data to BigQuery, including loading the data to Cloud Storage, local machine, Dataflow any more as mentioned here. Cloud Storage supports data in multiple formats such as CSV, JSON, Avro, Parquet and more. You also have various options to load data using Web UI, Command Line, API or using the Client Libraries which support C#, GO, Java, Node.JS, PHP, Python and Ruby.

Creating Metadata Catalog in Marklogic

I am trying to combined data from multiple sources like RDBMS, xml files, web services using Marklogic. For this as I see from MarkLogic documentation on Metadata Catalog (https://www.marklogic.com/solutions/metadata-catalog/), Data Virtualization (https://www.marklogic.com/solutions/data-virtualization/) and Data Unification it is very well possible. But I am not able to get hold of any documentation describing how exactly to go about it or which tools to use to achieve this.
Looking for some pointers.
As the second image in the data-virtualization link shows, you need to ingest all data into MarkLogic databases. MarkLogic can then be put in between to become the single entry point for end user applications that need access to that data.
The first link describes the capabilities of MarkLogic to hold all kinds of data. It partly does so by storing them as-is, partly by extracting text and metadata for searching, partly by conversion (if you needs go beyond what the original format allows).
MarkLogic provides the general purpose MarkLogic Content Pump (MLCP) tool for this purpose. It allows ingesting zipped or unzipped files, and applying transformations if necessary. If you need to retrieve your data from a different database, you might need a bit more work to get that out. http://developer.marklogic.com holds tutorials, blogs, and tools that should help you get going. Searching the MarkLogic Mailing List through http://marklogic.markmail.org/ can provide answers as well.
HTH!
Combining a lot of data is a very broad topic. Can you describe a couple types of data you'd like to integrate, and what services or queries you would like to build on that data?

RavenDB and Inserting data

I recently started using RavenDb. I am converting a relational dbase to use RavenDb. I have two simple tables in the Relational dbase:
tbStates
tbCities
I have all US cities linked to a state. How can I go about converting this to no-sql. Will I have to write a little application to read from the relational dbase and create the objects? Or are there some tools out there I can use to do this?
There is a utility called smuger http://ravendb.net/documentation/smuggler but I imagine you will have to convert your data to Json. It may be just as easy to write a console app that reads the tables to objects then loads to Raven.
Just to add I migrated a SQL Server database to RavenDB using the console application route.
I used EF to quickly pull out the data and converted it to my RavenDB domain then added it to RavenDB.
It Worked well as you will most likely want to tweak the domain anyway to work best with RavenDB (For example I had an Images SQL table that I turned into a List on the document etc).
See Ayende's RacoonBlog project on github (https://github.com/ayende/RaccoonBlog) as he does something similar to move subtext data to RavenDB. RacoonBlog is the engine powering his blog and makes for good learning material about how to use RavenDB.

Umbraco: Storing spatial data

I'm researching Umbraco for use as a base in a large CMS project, however the project calls for the SQL Server 2008 database to store spatial data against content.
Being new to Umbraco I'm still reading through the documentation and slowly building up an idea of it's architecture. However so far it doesn't look like Umbraco supports the storage of spatial data.
There only appears to be four database datatype options: date, integer, ntext, nvarchar
Is it possible to store spatial data to the database?
Update: Futher research into how Umbraco works has showed me I was on the wrong track. It seems the way to do this is store the lat/long data in the data inside the usual XML format Umbraco uses.
Then to use the Spatial.net extensions that have been built on top of Lucene.net, rather than use the limited search capabilities Examine exposes.
However this is all still theoretical, I've just not been able to achieve this. If I do before someone answers this question, I'll post my findings here to help others.
You could take a look at how to make user controls (with Visual Studio) in Umbraco.
It is also possible the versatility in Umbraco 'Document Types' is enough for you.
It is possible to extend Umbraco in any sort of way to get the solution you want. I don't know how you want the spatial data to interact with your frontend - so it is difficult to provide a direct solution.
Although there are ways to store spatial data and perform queries against it using Spatial.net, it's not a very elegant solution.
Instead I've created an additional table in SQL Server 2008 with the geometry/geography datatype and a reference to the Umbraco content it's connected with.
I've then got a event hook which updates this whether content is added/updated/deleted.

Does Amazon has APIs to index and search through documents stored using Amazon S3?

I have a lots of documents stored on Amazon S3. My questions are:
Does Amazon provide any services/APIs using which I can index the contents of the document and search them (full text indexing and searching)?
If it does could someone please point me to any link in the documentation.
If it does not then could this be achieved with Lucene and Zend Framework? Have any one of you implemented this? Can I get some pointers?
UPDATE: I do not intend to save my index on Amazon S3 rather I am looking forward to indexing the contents of the documents on S3 and serving them based on a search.
You can see this question, or this blog post if you want to do pure lucene, or you can use Solr, which is probably easier. See also this post.
Zend has a PHP port of Lucene, which ties in very well. You can look at the Zend documentation for how to use it.