Creating Metadata Catalog in Marklogic - metadata

I am trying to combined data from multiple sources like RDBMS, xml files, web services using Marklogic. For this as I see from MarkLogic documentation on Metadata Catalog (https://www.marklogic.com/solutions/metadata-catalog/), Data Virtualization (https://www.marklogic.com/solutions/data-virtualization/) and Data Unification it is very well possible. But I am not able to get hold of any documentation describing how exactly to go about it or which tools to use to achieve this.
Looking for some pointers.

As the second image in the data-virtualization link shows, you need to ingest all data into MarkLogic databases. MarkLogic can then be put in between to become the single entry point for end user applications that need access to that data.
The first link describes the capabilities of MarkLogic to hold all kinds of data. It partly does so by storing them as-is, partly by extracting text and metadata for searching, partly by conversion (if you needs go beyond what the original format allows).
MarkLogic provides the general purpose MarkLogic Content Pump (MLCP) tool for this purpose. It allows ingesting zipped or unzipped files, and applying transformations if necessary. If you need to retrieve your data from a different database, you might need a bit more work to get that out. http://developer.marklogic.com holds tutorials, blogs, and tools that should help you get going. Searching the MarkLogic Mailing List through http://marklogic.markmail.org/ can provide answers as well.
HTH!

Combining a lot of data is a very broad topic. Can you describe a couple types of data you'd like to integrate, and what services or queries you would like to build on that data?

Related

MongoDB + Google Big Query - Normalize Data and Import to BQ

I've done quite a bit of searching, but haven't been able to find anything within this community that fits my problem.
I have a MongoDB collection that I would like to normalize and upload to Google Big Query. Unfortunately, I don't even know where to start with this project.
What would be the best approach to normalize the data? From there, what is recommended when it comes to loading that data to BQ?
I realize I'm not giving much detail here... but any help would be appreciated. Please let me know if I can provide any additional information.
If you're using python, easy way is to read collection chunky and use pandas' to_gbq method. Easy and quite fast to implement. But better to get more details.
Additionally to the answer provided by SirJ, you have multiple options to load data to BigQuery, including loading the data to Cloud Storage, local machine, Dataflow any more as mentioned here. Cloud Storage supports data in multiple formats such as CSV, JSON, Avro, Parquet and more. You also have various options to load data using Web UI, Command Line, API or using the Client Libraries which support C#, GO, Java, Node.JS, PHP, Python and Ruby.

How to connect ontology and web application

I'm working on a project, within which we are using semantic web technologies and creating web application allows user to get recommendation in order to take right decision ( won't get into the details).
For me and my team its a first experience to work with ontology.
We've already created ontology (have rdf and owl formatted files)(We are using eclipse to keep them).
Separately, we've created web application. My question how to connect web page and owl, rdf formatted data, more precisely, how to ensure input through webpage to dataset and get output on page.
I've found some info( on old forums), that its easyrdf which can be used as embedded in php script. But not clear.
Based on youtube tutorials, I've downloaded jena fuseki and don't know what is the next step.
I would be glad to get any advice, suggestion :)
In my view, there is no a single way to do this.
I usually set up some back-end application in order to pre-process this kind of information (build SPARQL queries, execute them and parse the results) and then return to the front in some way understandable by that side.
So, you could have all your data in RDF format store, for example, in a TDF exposed by Fuseki and interact with that data with some back-end, aimed to consume, update and parse the results you could find there.
That's my advice, hope could be useful for you.
Good luck!

Koha RESTful api

I've been looking around the internet for information on Koha ILS restful api but I haven't found anything concrete. There is this link which talks about its HTTP API: http://wiki.koha-community.org/wiki/Koha_/svc/_HTTP_API but there are no examples and I'm quite confused with the MARCXML format required.
What I want to do is use this API to create biblio records into a remote Koha ILS system. If I understand correctly, using these services I can create records (probably using a JSON-to-MARC convert tool) but will I be able to also upload pdf files of each record in BASE64 format? It doesn't look like this is possible using this API although I'm not really sure.
The HTTP API available in Koha is a well-established protocol, called SRU, for searching library catalogs. This is protocol is meant only for searching, not for updating records.
Secondly, even though SRU 2.0 provides option for transmission of records in JSON format, most implementations do not support it yet.
Coming back to your use case, Koha cannot store PDF documents. It is a process automation tool in a library for physical collections, which deals only with metadata records. For storing digital documents you should look for document management solutions, such as DSpace, or smaller and easier Omeka. DSpace provides its own REST API for searching as well as supports SWORD protocol for uploading documents.

Programmatic export/dump/mass data retrieval (BaaS)

Does anyone have experiences with programmatic exports of data in conjunction with BaaS providers like e.g. parse.com or StackMob?
I am aware that both providers (as far as I can tell from the marketing talk) offer a REST API which will allow for queries against the database, not only to be used by mobile clients but also by e.g. custom web apps.
I am also aware that both providers offer a manual export of data (parse.com via their web interface, StackMob via support).
But lets say I would like to dump all data nightly, so that I can import it into a reporting system for instance. Or maybe simply to have an up-to-date backup.
In this case, I would need a programmatic way to export/replicate the data stored in the backend. Manual exports are not an option for obvious reasons.
The REST APIs offered however seem to be designed for specific queries, not for mass reads (performance?). Let alone the pricing - I assume none of the providers would be happy about a nightly X Gigabyte data export via their REST API, so their probably will be a price tag.
I just couldn't find any specific information on this topic so far, so I was wondering if anyone else has already gone through this. Also, any suggestions on StackMob/parse alternatives are welcome, especially if related to the data export topic.
Cheers, Alex
Did you see the section of the Parse REST API on Batch operations? Batch operations reduce the number of API calls needed to grab data so that you are not using a call for every row you retrieve. Keep in mind that there is still a limit (the default is 100, but you can set it to a maximum of 1000). That means you are still limited to pulling down 1000 rows per API call.
I can't comment on StackMob because I haven't used it. At my present job, we are using Parse and we wrote a C# app which compares the data in a Parse class with a SQL table and pulls down any changes.

Umbraco: Storing spatial data

I'm researching Umbraco for use as a base in a large CMS project, however the project calls for the SQL Server 2008 database to store spatial data against content.
Being new to Umbraco I'm still reading through the documentation and slowly building up an idea of it's architecture. However so far it doesn't look like Umbraco supports the storage of spatial data.
There only appears to be four database datatype options: date, integer, ntext, nvarchar
Is it possible to store spatial data to the database?
Update: Futher research into how Umbraco works has showed me I was on the wrong track. It seems the way to do this is store the lat/long data in the data inside the usual XML format Umbraco uses.
Then to use the Spatial.net extensions that have been built on top of Lucene.net, rather than use the limited search capabilities Examine exposes.
However this is all still theoretical, I've just not been able to achieve this. If I do before someone answers this question, I'll post my findings here to help others.
You could take a look at how to make user controls (with Visual Studio) in Umbraco.
It is also possible the versatility in Umbraco 'Document Types' is enough for you.
It is possible to extend Umbraco in any sort of way to get the solution you want. I don't know how you want the spatial data to interact with your frontend - so it is difficult to provide a direct solution.
Although there are ways to store spatial data and perform queries against it using Spatial.net, it's not a very elegant solution.
Instead I've created an additional table in SQL Server 2008 with the geometry/geography datatype and a reference to the Umbraco content it's connected with.
I've then got a event hook which updates this whether content is added/updated/deleted.