In around 2006, Intel engineers came up with the idea of enlarging the CRC-32 look-up table in order to process 4 or 8 bytes at once (maybe the original idea was not their, but at least they have published a source code with the implementation that was up to 4 times faster than previous implementations with just one table of 256 elements of 32 bits). Their code is available on SourceForge at sourceforge.net/projects/slicing-by-8 , known as Slicing-by-8. Don't you know whether Slicing-by-8 etc. (Slicing-by-4, Slicing-by-16) is encumbered by any patent? There is such a big number of active patents on CRC and CRC32, and they are written in such a language that it is very hard to understand.
I have not done a patent search on this, so I don't know if the concept was ever patented. However I do know that the idea well precedes the Intel paper you mention. I first implemented it in zlib in version 1.2.0, which was published in March 2003. The approach was suggested to me by Rodney Brown in an email to gcc-patches#gcc.gnu.org on April 19, 2002 (with no mention or hint of a patent). I don't know when he first came up with it, or if it was published even earlier by him or someone else.
The link you provided to sourceforge states that the slice-by-8 is BSD licensed, so you can freely use it without worrying about patent infringement.
UPDATE
From the link withing the source code itself it's pretty clear to me that you are able to use this without issue, just include the copyright info like they say:
http://www.opensource.org/licenses/bsd-license.html
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.
Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
Also https://www.whitesourcesoftware.com/whitesource-blog/top-10-bsd-license-questions-answered/
Do the BSD Licenses grant patent rights?
The BSD licenses don’t grant any patent rights.
...
The BSD
Licenses, on the other hand, just grant a copyright license. While
licensing your component, you will have to take care of the patents
yourself.
Ergo, Intel doesn't mention anything about patents in the source code, therefore it's highly unlikely that they are trying to trap people using their code, that they made freely available in the public domain.
And lastly, if someone else's patent did, or does, cover what Intel has released, if said patent came after 2004 (the initial date in the copyright), then that patent isn't valid against this code anyways; although it might take a lawyer to prove it. But date disputes are usually pretty simple.
Related
I am writing a parser (in C++) and I have a small list of strings (less than 100) where each one represents a valid parser tag. I need to map each such known tag to an enum value for further processing.
As all strings are known at compile time, I have been looking into using a perfect hash function for this purpose.
I am aware of existing tools and algorithms for perfect hash function generation s.a. gperf, mph, cmph. However, all such tools/implementations are under some restrictive license (such as GPL, LGPL, MPL), while due to my limitations I am looking for some code which is under a relaxed license for reuse (such as MIT license) and preferably in C/C++ or C#.
Are you aware of any such tool or code ?
Yes, here's one that seems to fit your parameters:
https://www.codeproject.com/Articles/989340/Practical-Perfect-Hashing-in-Csharp
Note it's using a license agreement that I'm not particularly familiar with. But it doesn't look like its GPL related.
I see a couple of different research groups, and at least one book, that talk about using Coq for designing certified programs. Is there are consensus on what the definition of certified program is? From what I can tell, all it really means is that the program was proved total and type correct. Now, the program's type may be something really exotic such as a list with a proof that it's nonempty, sorted, with all elements >= 5, etc. However, ultimately, is a certified program just one that Coq shows is total and type safe, where all the interesting questions boil down to what was included in the final type?
Edit 1
Based on wjedynak's answer, I had a look at Xavier Leroy's paper "Formal Verification of a Realistic Compiler", which is linked in the answers below. I think this contains some good information, but I think the more informative information in this sequence of research can be found in the paper Mechanized Semantics for the Clight Subset of the C Language by Sandrine Blazy and Xavier Leroy. This is the language that the "Formal Verification" paper adds optimizations to. In it, Blazy and Leroy present the syntax and semantics of the Clight language and then discuss the validation of these semantics in section 5. In section 5, there's a list of different strategies used for validating the compiler, which in some sense provides an overview of different strategies for creating a certified program. These are:
Manual reviews
Proving properties of the semantics
Verified translations
Testing executable semantics
Equivalence with alternate semantics
In any case, there are probably points that could be added and I'd certainly like to hear about more.
Going back to my original question of what the definition is of a certified program, it's still a little unclear to me. Wjedynak sort of provides an answer, but really the work by Leroy involved creating a compiler in Coq and then, in some sense, certifying it. In theory, it makes it possible to now prove things about the C programs since we can now go C->Coq->proof. In that sense, it seems like there's this work flow where we could
Write a program in X language
Form of a model of the program from step 1 in Coq or some other proof assistant tool. This could involve creating a model of the programming language in Coq or it could involve creating a model of the program directly (i.e. rewriting the program itself in Coq).
Prove some property about the model. Maybe it's a proof about the values. Maybe it's the proof of the equivalence of statements (stuff like 3=1+2 or f(x,y)=f(y,x), whatever.)
Then, based on these proofs, call the original program certified.
Alternatively, we could create a specification of a program in a proof assistant tool and then prove properties about the specification, but not the program itself.
In any case, I'm still interested in hearing alternative definitions if anyone has them.
I agree that the notion seems vague, but in my understanding a certified program is a program equipped/together with the proof of correctness. Now, by using rich and expressive type signatures you can make it so there is no need for a separate proof, but this is often only a matter of convenience. The real issue is what do we mean by correctness: this a matter of specification. You can take a look at e.g. Xavier Leroy. Formal verification of a realistic compiler.
First note that the phrase "certified" has a slightly French bias: elsewhere the expression "verified" or "proven" is often used.
In any case it is important to ask what that actually means. X. Leroy and CompCert is a very good starting point: it is a big project about C compiler verification, and Leroy is always keen to explain to his audience why verification matters. Especially when talking to people from "certification agencies" who usually mean testing, not proving.
Another big verification project is L4.verified which uses Isabelle/HOL. This part of the exposition explains a bit what is actually stated and proven, and what are the consequences. Unfortunately, the actual proof is top secret, so it cannot be checked publicly.
A certified program is a program that is paired with a proof that the program satisfies its specification, i.e., a certificate. The key is that there exists a proof object that can be checked independently of the tool that produced the proof.
A verified program has undergone verification, which in this context may typically mean that its specification has been formalized and proven correct in a system like Coq, but the proof is not necessarily certified by an external tool.
This distinction is well attested in the scientific literature and is not specific to Francophones. Xavier Leroy describes it very clearly in Section 2.2 of A formally verified compiler back-end.
My understanding is that "certified" in this sense is, as Makarius pointed out, an English word chosen by Francophones where native speakers might instead have used "formally verified". Coq was developed in France, and has many Francophone developers and users.
As to what "formal verification" means, Wikipedia notes (license: CC BY-SA 3.0) that it:
is the act of proving ... the correctness of intended algorithms underlying a system with respect to a certain formal specification or property, using formal methods of mathematics.
(I realise you would like a much more precise definition than this. I hope to update this answer in future, if I find one.)
Wikipedia especially notes the difference between verification and validation:
Validation: "Are we trying to make the right thing?", i.e., is the product specified to the user's actual needs?
Verification: "Have we made what we were trying to make?", i.e., does the product conform to the specifications?
The landmark paper seL4: Formal Verification of an OS Kernel (Klein, et al., 2009) corroborates this interpretation:
A cynic might say that an implementation proof only shows that the
implementation has precisely the same bugs that the specification
contains. This is true: the proof does not guarantee that the
specification describes the behaviour the user expects. The
difference [in a verified approach compared to a non-verified one]
is the degree of abstraction and the absence of whole classes of bugs.
Which classes of bugs are those? The Agda tutorial gives some idea:
no runtime errors (inevitable errors like I/O errors are handled; others are excluded by design).
no non-productive infinite loops.
It may means free of runtime error (numeric overflow, invalid references …), which is already good compared to most developed software, while still weak. The other meaning is proved to be correct according to a domain formalization; that is, it does not only have to be formally free of runtime errors, it also has to be proved to do what it's expected to do (which must have been precisely defined).
I'm trying to better understand my computer on the lower levels and what better way is there other than writing stack buffer overflow exploits? I recently came across ROP. I read the paper http://cseweb.ucsd.edu/~hovav/talks/blackhat08.html and it mentioned there was a compiler for ROB code.
What is the name of such a compiler for linux (64bit)?
Thanks,
Konstantin
I was one of the researchers on this project at UCSD and wrote the C-to-exploit-string compiler portion. The specific work you are referring to was SPARC-specific (and further tailored to a known Solaris libc binary). These papers actually give a better overview of what we did (and generalizations and programming approaches):
Our original CCS 2008 paper
Updated, generalized manuscript
For Linux + x64, there have been many tools for ROP attack creation since our research, which you can find generally by searching the web. And most of these are far more useful and user-friendly than our (now relatively old) research-specific tools.
Let me just offer a suggestion that if you want to understand the lower levels of your Linux system and haven't already done so, consider a "stepped" approach with the following:
"Old-School" Stack Injection: Disable non-executable stack protection on your box, and just inject shell code. Lot's of resources here -- start with Aleph One's seminal "Smashing The Stack For Fun And Profit" (widely available on the web).
Return-to-Libc: Re-enable non-executable stacks, and try to create a custom payload to jump into libc (probable execve) and try to grab a shell.
Once you've got a handle on those, then getting in to ROP will be a lot easier. If you're already there, then power to you!
Which package from CPAN should I use to send mail?
Sometime the timtowtdi approach is very tiring. For me, especially when it comes to package selection.
So all I want is to send email, potentially HTML emails. Between Mail-Sendmail, Mail-Sender, NET-SMTP (by the way - not available in PPM), Mail-SendEasy, and the 80 or so other packages that have 'Mail' in their package name - which one should I choose?
And while in this subject, what is your general apprach to choose the "canonical" package for a jog. I.e. the package that "everybody is using". Is there any rating or popularity billboard somewhere?
what is your general apprach to choose the "canonical" package for a jog. I.e. the package that "everybody is using". Is there any rating or popularity billboard somewhere?
When I want to pick which of several CPAN modules to use, the things I look at are
Documentation:
The litmus test for CPAN modules is the first page of the documentation. If there is a messy synopsis, or a synopsis without a simple working example, I guess that the module is probably not a good one. Untidy, messy, or misformatted documentation is also a red flag.
State of repair:
the date of release of the last version of the module tells you if it is being maintained,
the CPAN tester's reports tell you whether the module is likely to install without a struggle
the bugs list on rt.cpan.org gives you some idea of how active the author is about maintaining the module.
Also, is there a mailing list for the module? Having a mailing list is a pretty good sign of a good-quality, maintained, stable, documented and popular module.
Author:
What is the name of the module author?
How many other modules has the author released?
What kind of modules has the author released?
The author is a big factor. There are some authors who create things which have excellent quality, like Gisle Aas, Graham Barr, Andy Wardley, or Jan DuBois, and some people who turn out a lot of things which could be described as "experimental", like Damian Conway or Tatsuhiko Miyagawa. Be wary of people who've released a lot of Acme:: (joke) modules. Also, beware of things written by people who only maintain one or two modules. People who have fewer than five modules in total usually don't maintain them.
Other things:
cpanratings.perl.org is often helpful, but take it with a grain of salt.
Apart from that, a lot of it is just trial and error. Download and see if it passes its own tests, see if it even has any tests, write a test script, etc.
Things which often don't give a meaningful ranking:
The top results on Google tend to be ancient Perlmonks or perl.com or Dr. Dobbs' Journal articles, and these often point you towards outdated things.
search.cpan.org's search function puts modules which haven't been updated for ten years on page one, and the latest and greatest on page ten or something.
Beware of "hype":
One more thing I want to say: Be wary of advice on blogs, stackoverflow, Usenet news, etc. - people tend to guide you to whatever module happens to be flavour of the month, rather than a stable, proven solution. The "trendy" modules usually lack documentation, are unstable, have nightmarish dependencies, and quite often yesterday's fashionable modules suddenly fall out of favour and are abandoned, to be replaced by another flavour of the month, leaving you in the lurch if you decide to use them.
Task::Kensho generally makes good recommendations. For sending email it suggests Email::Sender
I'll throw in Email::Stuff. It's a nice wrapper for Email::MIME. You don't need to care about the MIME structure of the mail, the module does it for you.
Email::Stuff->from ('cpan#ali.as' )
->to ('santa#northpole.org' )
->bcc ('bunbun#sluggy.com' )
->text_body($body )
->attach (io('dead_bunbun_faked.gif')->all,
filename => 'dead_bunbun_proof.gif')
->send;
As for selecting modules,
If you don't need more than the basic features, I suggest looking at Mime::Lite.
use MIME::Lite;
my $msg = new MIME::Lite
From => 'Your friendly neighbourhood spiderman',
To => 'green#goblin.net',
CC => 'info#nemesis.org',
BCC => 'mj#spidey.info',
'Reply-to' => 'enemies#spidey.info',
Subject => 'Please stop',
Data => $data; #Email body
die 'Could not send mail' unless ($msg->send);
You can use Email::Send
http://search.cpan.org/dist/Email-Send/lib/Email/Send.pm
What I prefer is
Mail::Sendmail
MIME::Lite
If you require SSL then include
Net::SMTP::SSL
I'm curious if anyone understands, knows or can point me to comprehensive literature or source code on how Google created their popular passage blocks feature. However, if you know of any other application that can do the same please post your answer too.
If you do not know what I am writing about here is a link to an example of Popular Passages. When you look at the overview of the book Modelling the legal decision process for information technology applications ... By Georgios N. Yannopoulos you can see something like:
Popular passages
... direction, indeterminate. We have
not settled, because we have not
anticipated, the question which will
be raised by the unenvisaged case when
it occurs; whether some degree of
peace in the park is to be sacrificed
to, or defended against, those
children whose pleasure or interest it
is to use these things. When the
unenvisaged case does arise, we
confront the issues at stake and can
then settle the question by choosing
between the competing interests in the
way which best satisfies us. In
doing... Page 86
Appears in 15 books from 1968-2003
This would be a world fit for
"mechanical" jurisprudence. Plainly
this world is not our world; human
legislators can have no such knowledge
of all the possible combinations of
circumstances which the future may
bring. This inability to anticipate
brings with it a relative
indeterminacy of aim. When we are bold
enough to frame some general rule of
conduct (eg, a rule that no vehicle
may be taken into the park), the
language used in this context fixes
necessary conditions which anything
must satisfy... Page 86
Appears in 8 books from 1968-2000
more
It must be an intensive pattern matching process. I can only think of n-gram models, text corpus, automatic plagisrism detection. But, sometimes n-grams are probabilistic models for predicting the next item in a sequence and text corpus (to my knowledge) are manually created. And, in this particular case, popular passages, there can be a great deal of words.
I am really lost. If I wanted to create such a feature, how or where should I start? Also, include in your response what programming languages are best suited for this stuff: F# or any other functional lang, PERL, Python, Java... (I am becoming a F# fan myself)
PS: can someone include the tag automatic-plagiarism-detection, because i can't
Read this ACM paper by Kolak and Schilit, the Google researchers who developed Popular Passages. There are also a few relevant slides from this MapReduce course taught by Baldridge and Lease at The University of Texas at Austin.
In the small sample I looked over, it looks like all the passages picked were inline or block quotes. Just a guess, but perhaps Google Books looks for quote marks/differences in formatting and a citation, then uses a parsed version of the bibliography to associate the quote with the source. Hooray for style manuals.
This approach is obviously of no help to detect plagiarism, and is of little help if the corpus isn't in a format that preserves text formatting.
If you know which books are citing or referencing other books you don't need to look at all possible books only the books that are citing each other. If is is scientific reference often line and page numbers are included with the quote or can be found in the bibliography at the end of the book, so maybe google parses only this informations?
Google scholar certainly has the information about citing from paper to paper maybe from book to book too.