Statistics about "Microformat vs HTML+RDFa" adoption - usage-statistics

Are there some recent and reliable statistics about "Web use" (webpages using one standard or another) of these standards?
Or an specific statistic about vCard (person and/or organization) scope of use?
Only statistics, this question is not about "what the best ideia?" or "how to use it?". Looking for statistics numbers to compare Microformats adoption with (any kind of) RDFa in HTML adoption.
We can considere, for "counting pages" statistics, that Microdata is a kind of RDFa-HTML.
NOTES
Explain context
The RDFa Lite is the only W3C recommendation, when we talk about "Microdata vs Microformat", and Microdata have a better map to RDFa Lite. HTML5 has become a W3C Recommendation in 2014-10-28, and neither one was blessed by W3C. I understand that schema.org is the best way to adopt (reuse community-schemas) RDFa.
By other hand Microformats is older, and the most simple; so, perhaps, the most used in the Web (!? is it?).
About "vCard data statistics"
If we need some scope for the statistics, let's use vCard as scope:
Microformat's hCard and h-Card are standards for display vCards on (any) HTML, and was used for people and organizations.
schema.org's Person and Organization encodes vCard information with (standard) RDFa Lite or Microdata.
Other notes
Wikipedia express an old (2012's) and not-confirmable assertion (no source!), "Microformats such as hCard, however, continue to be published more than schema and others on the web", and Webdatacommons is a mess, no statistical report.
(edit) now Wikipedia's citation error is fixed.
(edit after #sashoalm comment)
Note for those who disagree that this question is valid.
This question is a software problem, not a "request for off-site resource"...
PROBLEM: to decide what library, framework, data-model, etc. in a project, we need to use tools that are in use today and in the next few years... To make project decisions in a software development, we need statistics about user tendency, framework adoption, etc.
PS: here in Stackoverflow there are a lot of discussions about language statistics, that is the same "set of problems". Example: 1, 2, 3,4, 5, 6. See also the questions tagged with [usage-statistics].

Now I see, there are some statistics (!!), the link of Wikipedia was lost... I corrected. It isn't updated, is from "Winter 2013" (~1.5 or 2 years old collected data), but show reality and tendencies.
http://webdatacommons.org/structureddata/index.html#toc2
This is the chart at the report (with RDFa+HTML dominance!):
Interpreting:
the section 5, "Extraction Process", say that "on each page, we run our RDF extractor based on the Anything To Triples (Any23) library", so all (RDF and Microformat) resulted in "triples" (not only RDF).
The ideia for "per domain" statistics is that domains use uniform politics for all pages... But I think this uniformity is false, only few pages per domain adopt "semantic markup" ... It is not more unbiased than URLs, is only another picture. Anyway, the outcome was dead heat, ~57% vs 43%.
Only 21% of the "semantic markup URLs" of 2013 was Microformat, all other are RDFa-HTML (Microdata is also a kind of RDFa).
using the average of percentuals of Domains (Ds) and URLs (Us), (Ds+Us)/2, the outcome is ~60% for RDFs and ~40% for Microformats.
before 2013 there was a dominance of Microformats, so, is evident the big growing of "RDFa-HTML" since 2011... The tendency is clear.
If we adopt the arithmetic mean of "per domain" and "per URL" countings, we have Microformats and RDFa-HTML near each other, with but with little less Microformat (and the strong tendency to RDFa-HTML grow in 2014).
Here a table for #sashoalm discussion, showing the percentuals and totals
NOTE1: HTML5 was released only 2014-10-28, so only ~2015-10 we will can check the real (definitive) impact of the new standard on the Web. An important expected impact is that Microdata not was blessed by HTML5, so the only standard is HTML+RDFa (that recommends RDFa Lite)... In the future perhaps there will less Microdata and more schema.org.
NOTE2: methodological problem of counting web-pages, of boilerplate text with some huge-cloned "semantic markup": I think that the "next generation" of statiscs can use some "per domain analisys" to make URL substatistics (sampling) of diversity (of semantically marked pages). Ideal is to weigh (p. ex. count once the non-clones and use 1+SQRT(count) of clones) the boilerplate.
Conclusion
Today perhaps some people use Microformat, but there are more pages in the Web using RDFa-HTML (Microdata, RDFa, RDFa Lite, etc.), and the tendency is to grow.
If your project is for next years, the statistics say to use RDFa.
NOTE
Another insteresting counting for RDFa is not the use, but the reuse of vocabularies (!). See Linked Open Vocabularies (LOV)

The last statistics from the WebDataCommons as follows:
Source: http://webdatacommons.org/structureddata/2016-10/stats/stats.html
Number of domain parsed: 34 million pay-level-domains
Number of domains with RDFa, Microdata and Microformats: 5.63 million (16.5%)
Popularity of different formats:

Related

What is the correct way to specify DC metadata for a multi-language ePub?

I am developing an ePub. In the content.opf file I have to specify a series of metadata by using DC standard. For example dc:title and dc:creator.
However my book is a multilanguage book, that is, it contains two translations of the same text: English and Russian. The standard reference manual states that I can have more dc:language statements. For example:
<dc:language>en</dc:language>
<dc:language>ru</dc:language>
but it does not say how to specify the other metadata for more than one language. Consider, for example, dc:creator. I tried
<dc:creator xml:lang="en">Dario de Judicibus</dc:creator>
<dc:creator xml:lang="ru">Дарио де Юдицибус</dc:creator>
I get an error from the distribution platform validator, which states that the format of ePub is not correct. It looks like I cannot use xml:lang in dc:creator even if, in theory, that is an XML attribute that can be used with any XML tag. Same for dc:title:
<dc:title xml:lang="en">My Book Title</dc:title>
<dc:title xml:lang="ru">Название Mоей Kниги</dc:title>
Could someone who has had to face the same problem as me, namely writing the OPF for an ePub that contains a text in multiple languages, tell me what is the correct way to do it? In the standards for the OPF 3.x I have not been able to find any useful information to establish this.
SOLVED
I verified my code with one of the author of the W3C specifications for OPF and he told me that what I wrote is correct, but that some validator is not used to multi-language documents, so that the problem is the validator, not the code. I write this in case someone else may have the same problem.

Adding signingCertificateV2 attribute to SignedCms

ContentInfo content = new ContentInfo(new Oid("1.2.840.113549.1.7.5"), Encoding.UTF8.GetBytes(str.ToString()));
SignedCms signedCms = new SignedCms(content, true);
CmsSigner cmsSigner = new CmsSigner(cert);
cmsSigner.IncludeOption = X509IncludeOption.EndCertOnly;
cmsSigner.DigestAlgorithm = new Oid("2.16.840.1.101.3.4.2.1");
cmsSigner.SignerIdentifierType = SubjectIdentifierType.IssuerAndSerialNumber;
Pkcs9AttributeObject att = new Pkcs9AttributeObject();
AsnEncodedData data = new AsnEncodedData(new SHA1Managed().ComputeHash(cert.RawData));
cmsSigner.SignedAttributes.Add(new Pkcs9SigningTime(DateTime.UtcNow));
//cmsSigner.SignedAttributes.Add(new Pkcs9ContentType());
//cmsSigner.SignedAttributes.Add(new Pkcs9MessageDigest());
signedCms.ComputeSignature(cmsSigner);
return Convert.ToBase64String(signedCms.Encode());
I have used this code to sign a document and this code is working fine but my problem is that there is another requirement needed to add "SigningCertificateV2" attribute as a signed attribute...
I have already added ContentType, MessageDigest and SigningTime. But i dont know how to add "SigningCertificateV2" attribute.. (1.2.840.113549.1.9.16.2.47)
Can You please help me with this?
(It seems this question's been sitting unanswered for more than a year and a half, so I really hope Asharf managed to comply with the new requirement somehow. There's been more than 300 views though, so hopefully a late answer would still be helpful.)
The types in System.Security.Cryptography.Pkcs do not provide support for the full range of attributes defined by various Cryptographic Message Syntax (CMS) specs like CMS Advanced Electronic Signatures (CAdES) and Enhanced Security Services Update (ESSCertIdUpdate), but only for the most commonly used attribs like Pkcs9ContentType, Pkcs9MessageDigest, Pkcs9SigningTime, etc.
Specifically, there's no "strongly-typed" wrapper around the SigningCertificatev2 attribute, defined by ESSCertIdUpdate (RFC 5035). In that case, one has to use the "generic" CryptographicAttributeObject type and construct the ASN.1 encoded data for the attribute "by hand", that is, concoct a raw byte[], typically by using System.Formats.Asn1.AsnWriter (and thoroughly reading the relevant RFC -- that's always a good idea, btw).
Defining a wrapper for the SigningCertificatev2 attrib requires a fair amount of code as several other related RFC types like ESSCertIDv2 and PolicyInformation need also be defined.
Luckily, there's a well-known open-source software project that already does exactly that -- it can be used for "inspiration" ;-). That's NuGet, and specifically the NuGet client.
(I know link-only answers are frowned upon on SO, but I guess a link-mostly answer is better than no answer, so here goes...).
Here are the links to the relevant parts in the GutHub repo.
CreateSigningCertificateV2() in AttributeUtility
The SigningCertificateV2 type itself.
The EssCertIdV2 type, used by SigningCertificateV2.
The NuGet client's attribute implementation can't be used directly as a library, but it should provide a nice guideline about how to construct the PKCS attribute.

Add an Intro page to exam in R/exams

I am using R/exams to generate Moodle exams (Thanks Achim and team). I would like to make an introductory page to set the scenario for the exam. Is there a way to do it? (Now, I am generating a schoice with answerlist blank.)
Thanks!
João Marôco
Usually, I wouldn't do this "inside" the exam but "outside". In Moodle you can include a "Description" in the "General Settings" when editing the quiz. This is where I would put all the general information so that students read this before starting with the actual questions.
If you want to include R-generated content (R output, graphics, data, ...) in this description I would usually include this in "Question 1" rather than as a "Question 0" without any actual questions.
The "description" question type could be used for the latter, though. However, it is currently not supported in exams2moodle() (I'll put it on the wishlist). You could manually work around this in the following steps:
Create a string question with the desired content and set the associated expoints to 0.
Generate the Moodle XML output as usual with exams2moodle().
Open the XML file in a text editor or simply within RStudio and replace <question type="shortanswer"> with <question type="description"> for the relevant questions.
In the XML file omit the <answer>...</answer> for the relevant questions.
Caveat: As you are aware it is technically possible to share the same data across subsequent exercises within the same exam. If .Rnw exercises are used, all variables from the exercises are created in the global environment (.GlobalEnv) and can be easily accessed anyway. If .Rmd exercises are used, it is necessary to set the envir argument to a dedicated shared environment (e.g., .GlobalEnv or a new.env()) in exams2moodle(..., envir = ...). However, if this is done then no random exercises must be drawn in Moodle because this would break up the connections between the exercises (i.e., the first replication in Question 1 is not necessarily followed by by the first replication in Question 2). Instead you have to put together tests with a fixed selection of exercises (i.e., always the first replication for all questions or the second replication for all questions, ...).

RESTful urls for restore operation from a trash bin

I've been implementing a RESTful web service which has these operations:
List articles:
GET /articles
Delete articles (which should remove only selected articles to a trash bin):
DELETE /articles
List articles in the trash bin:
GET /trash/articles
I have to implement an operation for restoring "articles" from "/trash/articles" back to "/articles".
And here is the question. Ho do you usually do it? What url do I have to use?
I came up to the 2 ways of doing it. The first is:
DELETE /trash/articles
But it feels strange and a user can read it like "delete it permanently, don't restore".
And the second way is
PUT /trash/articles
Which is more odd and a user will be confused what this operation does.
I'm new to REST, so please advice how you do it normally. I tried to search in google but I don't know how to ask it right, so I didn't get something useful.
Another option could be to use "query params" to define a "complementary action/verb" to cover this "special condition" you have (given that this is not very easily covered by the HTTP verbs). This then could be done for example by:
PUT /trash/articles?restore=true
This would make the URI path still complaint with REST guideline (referring to a resource, and not encoding "actions" - like "restore") and would shift the "extra semantics" of what you want to do (which is a very special situation) to the "query parameter". "Query params" are very commonly used for "filtering" resources in REST, not so much for this kind of situation... but maybe this is a reasonable assumption given your requirements.
I would recommend using
PUT /restore/articles
or
PUT /restore/trash/articles
Late answer but, in my opinion, the best way is to change the resource itself.
For instance:
<article is_in_trash="true">
<title>come title</title>
<body>the article body</body>
<date>1990-01-01</date>
</article>
So, in order to remove the article from Trash, you would simple use PUT an updated version of the article, where is_in_trash="false".

GitHub advanced search default behaviour

I'm new to Github and just want to browse through repos to find interesting ones.
I want, say, all ruby repos with more than 100 followers.
I go to advanced search and try "followers:100"
and get only repos with EXACTLY 100 followers (4 at the moment).
That differs from what i expect to be default behavior - find repos with 100 or more followers (more like Stackoverflow search works).
I am quite frustrated, because I can't get what is expected to be basic search functionality from a very popular site, and that makes me think that I'm obviously not understanding something very simple (because I think it is not possible for Github not to have such functions).
So is there a way for me to get the desired result?
Update January 2013 (source: "A whole new code search")
followers:>100
(intervals are supported: followers:100..150)
Original answer (April 2011)
followers:[100 TO *]
should do what you want: see your query with 100 or more followers.
(Note: the "TO" needs to be in uppercase)
(Source: New and Improved Search)
For example, we can search:
for people with a username fuzzily similar to ‘chacon’
who use Ruby as their primary language,
have at least 5 repos and
at least one follower:
You might also like the Hubscovery application. A simple search interface for Github.