Need some help understanding Vocabulary of Interlinked Dataset (VoID) in Linked Open Data - metadata

I have been trying to understand VoID in Linked Open Data. It would be great if anyone could help clarify some of my confusions.
Does it need to be stored in a separate file or it can be included in the RDF dataset itself? If so, how do I query it? (A sample query would be really helpful)
How is the information in VoID used in real life?

Does it need to be stored in a separate file or it can be included in the RDF dataset itself? If so, how do I query it? (A sample query would be really helpful)
In theory not, but for practical purposes yes. In the end the information is encoded in triples, so it doesn't really matter in what file you put them and you could argue that it's best to actually put the VoID info into the data files and serve these triples with your data as meta-info. It's query-able as all other forms of RDF, either load it into some SPARQL endpoint or use a library that can directly load RDF files. This however also shows the reason why a separate file makes sense: instead of having to load potentially large data files just to get some dataset meta info, it makes sense to offer the meta-data in its own file.
How is the information in VoID used in real life?
VoID is actually used in several scenarios already, but mostly a recommendation and a good idea. The most prominent use-cases i know of is to get your dataset shown in the LOD Cloud. You currently have to register it with datahub.io and add a VoID file (example from my associations dataset).
Other examples (sadly many defunct nowadays) can be found here: http://semanticweb.org/wiki/VoID.html

Related

How to use VTK to efficiently write time-varying field data on a fixed mesh?

I am working on physics simulation research. I have a large fixed grid in one of my projects that does not vary with time. The fields on the grid, on the other hand, vary with time in the simulation. I need to use VTK to record the field data in each step for visualization (Paraview).
The method I am using is to write a separate *.vtu file to disk at each time step. This basically serves the purpose, but actually writes a lot of duplicate data (re-recording the geometry of the mesh at each step), which not only consumes more disk space, but also wastes time on encoding and parsing.
I would like to have a way to write the mesh information only once, and the rest of the time only new field data is written, while being able to guarantee the same visualization. Please let me know if VTK and Paraview provide such an interface and how to implement it.
Using .pvtu and refer to the same .vtu as Piece for each step should do the trick.
See this similar post on the ParaView discourse, and the pvtu doc
EDIT
This seems to be a side effect of the format, this is not supported by the writer.
The correct solution is to use another file format ...
Let me provide my own research findings for reference.
As Nico said, with the combination of pvtu/vtu files, we could theoretically implement a geometry structure stored in a separate vtu file, referenced by a pvtu file. Setting the NumberOfPieces attribute of the ptvu file to 1 would enable the construction of only one separate vtu file.
However, the VTK library does not expose a dedicated operation interface to control the writing process of vtu files. No matter how it is set, as long as the writer's input contains geometry structures, the writer will write geometry information to disk, and this process cannot be skipped through the exposed interface.
However, it is indeed possible to make multiple pvtu files point to the same vtu file by manually editing the piece node in the ptvu file, and paraview can recognize and visualize such a file group properly.
I did not proceed to try adding arrays to the unstructured grid and using pvtu output.
So, I think the conclusion is.
if you don't want to dive into VTK's library code and XML implementation, then this approach doesn't make sense.
if you are willing to write a series of files, delete most of them from the vtu file, and then point all the pvtu's piece nodes to the only surviving vtu file by editing the pvtu file, you can save a lot of disk space, but will not shorten the write, read, and parse times.
If you implement an XML writer by yourself, you can achieve all the requirements in theory, but it requires a lot of coding work.

UE5: import csv for a data driven animation

I was wondering if UE5 can support 50k+ lines of a db/CSV as they rappresent the parameters of the whole animation. (coordinates[x,y,z], TimeDelta, Speed, Brake)
Any documentation is very much appreciated
There is no existing functionality in the engine itself for this extremely specific use case. Of course, it can "support" it if you write a custom solution using the many available tools within the engine.
You can use IFileHandle to stream in a file (your csv): link
You can then parse the incoming data to create a FVector3 of your coordinates, a float of your TimeDelta, etc. For example, FVector::InitFromString may help: link
However, this depends very much on the format of your data. Parsing string/texts into values is not specific to UE4, you can find a lot of info on converting streams of binary/character data to needed values.
Applying the animation as the data is read is a separate, quite big, task. Since you provide no details on what the animation data represents, or what you need to apply it to, I cannot really help.
In general though, it can help you a lot to break down your question into 3-4 separate, more specific, questions. In any case though, this is a task that will require a lot of research and work.
And even before that, it might be good to research alternative approaches and changing the pipeline, to avoid using such non-standard file structures for animation.

Best way to load static, local data into UITableView

I have some static data that is going to be shipped in my iPhone app's bundle. It is updated very rarely (about twice a year) and the app will not be doing any networking. I'm going to update the data manually when these changes occur.
I want to know what the best way to load this data is. I've already started using an XML file and parsing it as needed, but it's a huge amount of data to do this with. I'm finding it tedious. There are about 120 pages worth of stuff, with images etc. Just not fun.
I've heard about core data, but I don't really know if it's going to do what I want. I want to find a way to just create a UITableView controller and a detail view, then somehow bind the data to these controllers. (a teensy amount of example code would be appreciated for this part)
If anyone has any suggestions, please feel free to leave an answer or comment.
Here's a sample of my XML:
<?xml version="1.0"?>
<jftut>
<heading name="Introduction">
<subheading>Jungleflasher Overview</subheading>
<subheading>Before Using Jungleflasher</subheading>
</heading>
<heading name="Which Drive do I have?">
<image>DriveIdentification.png</image>
</heading>
<heading name="Drives">
<manufacturer name="Samsung">
<version>MS25</version>
<version>MS28</version>
</manufacturer>
<manufacturer name="Hitachi">
<version>32 through 59</version>
<version>78</version>
<version>79</version>
</manufacturer>
<manufacturer name="BenQ">
<version>VAD6038</version>
</manufacturer>
<manufacturer name="LiteOn">
<version>74850c</version>
<version>83850c v1</version>
<version>83850c v2</version>
<version>93450c</version>
</manufacturer>
</heading>
</jftut>
Below each of those bottom nodes will be an article detailing how to perform a task related to them.
If the question needs more detail, just ask :)
Thanks,
Aurum Aquila
I think core data will serve your purpose in a flexible way.You said updation is rare, even if it is not so, it would'nt be tedious while using core data,where entities are mapped in to objects and assigning values will automatically update db without writing a single line of sql statement.
I want to know in which format is your data.
First of all store all your local data in a sqlite file.Then you can use use core data as a wrapper for all your operation.
For reference you can follow tutorial 1 & Tutorial 2

How can I build a generic dataset-handling Perl library?

I want to build a generic Perl module for handling and analysing biomedical character separated datasets and which can, most certain, be used on any kind of datasets that contain a mixture of categorical (A,B,C,..) and continuous (1.2,3,881..) and identifier (XXX1,XXX2...). The plan is to have people initialize the module and then use some arguments to point to the data file(s), the place were the analysis reports should be placed and the structure of the data.
By structure of data I mean which variable is in which place and its name/type. And this is where I need some enlightenment. I am baffled how to do this in a clean way. Obviously, having people create a simple schema file, be it XML or some other format would be the cleanest but maybe not all people enjoy doing something like this.
The solutions I can think of are:
Create a configuration file in XML or similar and with a prespecified format.
Pass the information during initialization of the module.
Use the first row of the data as headers and try to guess types (ouch)
Surely there must be a "canonical" way of doing this that is also usable and efficient.
This doesn't answer your question directly, but have you checked CPAN? It might have the module you need already. If not, it might have similar modules -- related either to biomedical data or simply to delimited data handling -- that you can mine for good ideas, both concerning formats for metadata and your module's API.
Any of the approaches you've listed could make sense. It all depends on how complex the data structures and their definitions are. What will make something like this useful to people is whether it saves them time and effort. So, your decision will have to be answered based on what approach will best satisfy the need to make:
use of the module easy
reuse of data definitions easy
the data definition language sufficiently expressive to describe all known use cases
the data definition language sufficiently simple that an infrequent user can spend minimal time with the docs before getting real work done.
For example, if I just need to enter the names of the columns and their types (and there are only 4 well defined types), doing this each time in a script isn't too bad. Unless I have 350 columns to deal with in every file.
However, if large, complicated structure definitions are common, then a more modular reuse oriented approach is better.
If your data description language is difficult to work with, you can mitigate the issue a bit by providing a configuration tool that allows one to create and edit data schemes.
rx might be worth looking at, as well as the Data::Rx module on the CPAN. It provides schema checking for JSON, but there is nothing inherent in the model that makes it JSON-only.

What ist a RESTful-resource in the context of large data sets, i.E. weather data?

So I am working on a webservice to access our weather forecast data (10000 locations, 40 parameters each, hourly values for the next 14 days = about 130 million values).
So I read all about RESTful services and its ideology.
So I understand that an URL is adressing a ressource.
But what is a ressource in my case?
The common use case is that you want to get the data for a couple of parameters over a timespan at one or more location. So clearly giving every value its own URL is not pratical and would result in hundreds of requests. I have the feeling that my specific problem doesn't excactly fit into the RESTful pattern.
Update: To clarify: There are two usage patterns of the service. 1. Raw data; rows and rows of data for several locations and parameters.
Interpreted data; the raw data calculated into symbols (Suns & clouds, for example) and other parameters.
There is not one 'forecast'. Different clients have different needs for data.
The reason I think this doesn't fit into the REST-pattern is, that while I can actually have a 'forecast' ressource, I still have to submit a lot of request parameters. So a simple GET-request on a ressource doesn't work, I end up POSTing data all over the place.
So I am working on a webservice to access our weather forecast data (10000 locations, 40 parameters each, hourly values for the next 14 days = about 130 million values). ... But what is a ressource in my case?
That depends on the details of your problem domain. Simply having a large amount of data is not a good reason to avoid REST. There are smart ways and dumb ways to model and expose that data.
As you rightly see, your main goal at this point should be to understand what exactly a resource is. Knowing only enough about weather forecasting to follow the Weather Channel, I won't be much help here. It's for domain experts like yourself to make that call.
If you were to explain in a little more detail the major domain concepts you're working with, it might make it a little easier to give specific advice.
For example, one resource might be Forecast. When weatherpeople talk about Forecasts, what words keep coming up? When you think about breaking a forecast down into smaller elements, what words do you use to describe the pieces?
Do this process recursively, and you'll probably be able to make a list of important terms. Don't forget that these terms can describe things or actions. Think about what these terms really mean, what data you can use to model them, how they can be aggregated.
At this point you'll have the makings of something you can start building a RESTful system around - but not before.
Don't forget that a RESTful system is not a data dump wrapped in HTTP - it's a hypertext-driven system.
Also don't forget that media types are the point of contact between your server and its clients. A media type is only limited by your imagination and can model datasets of any size if you're clever about it. It can contain XML, JSON, YAML, binary elements such as a Bloom Filter, or whatever works for the problem.
Firstly, there is no once-and-for-all right answer.
Each valid url is something that makes sense to query, think of them as equivalents to providing query forms for people looking for your data - that might help you narrow down the scenarios.
It is a matter of personal taste and possibly the toolkit you use, as to what goes into the basic url path and what parameters are encoded. The debate is a bit like the XML debate over putting values in elements vs attributes. It is not always a rational or logically decided issue nor will everybody be kind in their comments on your decisions.
If you are using a backend like Rails, that implies certain conventions. Even if you're not using Rails, it makes sense to work in the same way unless you have a strong reason to change. That way, people writing clients to talk to Rails-based services will find yours easier to understand and it saves you on documentation time ;-)
Maybe you can use forecast as the ressource and go deeper to fine grained services with xlink.
Would it be possible to do something like this,Since you have so many parameters so i was thinking if somehow you can relate it to a mix of id / parameter combo to decrease the url size
/WeatherForeCastService//day/hour
www.weatherornot.com/today/days/x // (where x is number of days)
www.weatherornot.com/today/9am/hours/h // (where h is number of hours)