How can I detect the frameworks and/or libraries used in any Source Code Repository/Directory programatically? - github

Suppose I have a source code directory, I want to run a script that would scan the code in the directory and return the languages, frameworks and libraries used in it. I've tried github/linguist, its a great tool which even Github uses to detect the programming languages used in a source code, however I am not able go beyond that and detect the framework exactly.
I even tried tools like it-depends, to fetch the dependencies but, its getting messed up.
Could someone help me out to figure out how I can do this stuff, with an existing tool or if have to make one such tool how should I approach it.
Thanks in Advance

This is, in the general case, impossible. The halting problem precludes any program from being able to compute, in finite time, what other programs may or may not do - including what dependencies it requires to run. Sure, you can make it work for some inputs - but never for all.
So you have to compromise:
which languages do you need to support? it-depends does not try to support Java, for example. Different languages have different ways of calling in dependencies from their source-code. For example, if working with C, you will want to look at #includes.
which build-chains to you need to support? parsing a standard Makefile for C is very different from, say, looking into a Maven pom.xml for Java. Additionally, build-chains can perform arbitrary computation -- and again, due to the halting problem, your dependency-detection program will not be able to "statically" figure out intended behavior. It is entirely possible to link against one library or another one (or none at all) depending on what is detected to exist. What should you output in this case?. For programs that have no documented build process, you simply cannot know their dependencies. Often, the build-process is human-documented but not machine-readable...
what do you consider a library/framework? long-lived libraries can evolve through many different versions, and the fact that one version is required and not another may not be explicit in the source-code. If a code-base depends on behavior found in only a specific, now superseded, version of a library, and no explicit mention of that version is found -- your dependency-detection program will have no way to know about it (unless you code in library-version-specific detection; which is doable, but on a case-by-case basis, and requires deep knowledge of differences between versions).
Therefore the answer to your question is that... it depends (they go into a fair amount of detail regarding limitations). For the specific case of Java + Maven, which is not covered by it-depends, you can use Maven itself, via mvn dependency:tree. Choose a subset of the problem instead of trying to solve it all at once.

Related

ebpf: bpf_prog_load() vs bpf_object__load()

I have not used libbpf in a while. Now, when I'm looking at the source code and examples, it looks to me that all API now is built around bpf_object while before it was based on program FD (at least on the user-facing level). I believe that fd is now hidden in bpf_object or such.
Of course it keeps backward compatibility and I still can use bpf_prog_load for example, however it looks like the preferred way of writing application code using libbpf is by bpf_object API?
Correct me if I'm wrong. Thanks!
Sounds mostly correct to me.
Low-Level Wrappers
If I remember correctly, the functions returning file descriptors in libbpf, mostly defined in tools/lib/bpf/bpf.c, have always been very low-level. This is the case for bpf_load_program() for example, which is no more than a wrapper around the bpf() system call for loading programs. Such functions are still available, but their use may be tedious for complex use cases.
bpf_prog_load()
Some more advanced functions have long been provided. bpf_prog_load(), that you mention, is one of them, but it returns an error code, not a file descriptor. It is still available as one option to load programs with the library.
bpf_object__*()
Although I don't think there are strict guidelines, I believe it is true that most example now use the bpf_object__*() function. One reason is that they provide a more consistent user experience, being organised around the manipulation of an object file to extract all the relevant bytecode and metadata, and then to load and attach the program. One other reason, I think, is that since this model has been favoured over the last releases, these functions have better support for recent eBPF features and the bpf_object__*() functions offer features that the older bpf_prog_load() workflow does not support.
Libbpf Evolves
At last, it's worth mentioning that libbpf's API is currently undergoing some review and will likely be reworked as part of a major v1.0 release. You may want to have a look at the work document linked in the announcement: Some bpf_object__ functions may be deprecated, and similarly there is currently a proposal to:
Deprecate bpf_prog_load() and bpf_prog_load_xattr() in favor of bpf_object__open_{mem, file}() and bpf_object__load() combo.
There is nothing certain yet regarding the v1.0 release, so I wouldn't worry too much about “deprecation” at the moment - I don't expect all functions to be removed just yet. But that's something you may want to consider when building your next applications.

black box function export in MATLAB [duplicate]

If a company works on matlab projects, then how do they provide the client the project? I mean which file do they send to the client as they cannot hand over the client the whole codes and data ?
It would depend on lots of things, such as the nature of the product you are building for the client, your relationship and contractual agreement with them, and whether they need to modify the product in the future.
When I carry out consultancy on MATLAB projects for a company, I usually supply them with MATLAB source code. Part of the contract would typically say that they own the code (and the copyright to the code) that I produce for them, and they can then do pretty much whatever they want with it.
If you have a different relationship, where you continue to own the code and need to prevent them from reading it and/or modifying it, then the issue is really the same as it is for any other language: you rely on a mixture of technological restrictions and legal restrictions, designed to be as restrictive as you need while minimizing inconvenience for the end-user.
For example,
You can obfuscate your code using the command pcode. That will prevent almost everyone who isn't extremely determined from seeing your code and modifying it (there are some loopholes though), but they will still be able to run it within MATLAB. Downsides might be that your code may become unexecutable in a future version of MATLAB, so you may need to support it again to fix that later. To mitigate this, you could specify in your contract or license agreement that only specific versions of MATLAB will be supported.
You can use MATLAB Compiler to produce a standalone library or executable that contains the code in an encrypted form. Downsides might be that they would rather use the code from within MATLAB. An upside would be that unlike the first option it doesn't require MATLAB, so you're not vulnerable to backward-compatibility issues in future.
You can include licence-management code within your MATLAB application. You can either roll your own, perhaps by calling a bit of Java for the cryptography (you will likely not be able make it very secure, unless you're very talented, but you'll probably be able to make something simple and workable), or you can buy third-party C libraries that do it well, and call them from MATLAB.
You can simply put copyright lines in your code saying that you own the copyright, and licence the code to them under specific terms, such as that they may view it, use it, but not modify or redistribute it. If you really want, you could ask them to also sign a non-disclosure agreement requiring them not to discuss the content of the code with third parties.
Although the technological restrictions available are a little different in MATLAB than they would be for a compiled language such as C or Java, at the end of the day those are only ever there to keep honest people honest - anyone determined will be able to get around them eventually, and they may well inconvenience the honest people, annoying them into disliking your product or service.
Better to use a mixture of very light technological restrictions, crystal-clear contract and licensing terms, and trust.
<advert> One of the consultancy services I offer is advice and help in preparing MATLAB code for deployment, including protecting it. If you think you'd benefit from that, please get in touch. </advert>
You can use the Matlab Compiler and compile your codes in to an exe file for windows. This is what is usually expected by a company. Some who have R&D themselves might ask you for the original m-code, or specific functions depending on your relationship/contract with them. I've been asked to give m-code several times, and it says on my contract that I am suppose to give this information to them. (UK based)

Understanding Hadoop Packages and Classes

I have been using CDH and HDP for a while (both in the pseudo-distributed mode) on a VM as well as installing natively on Ubuntu. Although my question is probably relevant to all Projects within the Apache Hadoop Ecosystem, let me ask this specifically in the context of Avro.
What is the best way to go about figuring out what the different packages and the classes within the packages do. I usually end up referring to the Javadoc for the project (Avro in this case) but the overviews for packages and classes end up being awfully inadequate.
For e.g. Take two of the Avro packages: org.apache.avro.specific and org.apache.avro.generic These are used for creating Specific and Generic Readers and Writers (respectively) but I'm not a 100% sure what these are for. I have used the Specific Package for in cases when I have used Avro Code Generation and the Generic ones when I don't want to use code generation. However, I am not sure if that is the only reason for using one vs. the other.
Another example: The Encoder\Decoder Classes are used for low-level SerDe, the DatumReader\DatumWrite for a "medium-level" Serde while most application layer interactions with Avro will probably use Generic\Specific Readers\Writers. Without having struggled through the pain of using these classes, how is a user to know what to use for what?
Is there a better way to get a good overview of each package (clearly the javadoc is not well documented) and the classes within the package?
PS: I have similar questions for essentially all other Hadoop Projects (Hive, HBASE etc.) - the Javadocs seem to be grossly inadequate overall. I just wonder what other developers end up doing to figure these out.
Any inputs would be great.
I download the source code and skim through it to get the idea what it does. If there is javadoc, I read that too. I tend to concentrate on the interfaces that I need and move on from there, that way I put everything into context and it makes it easier to figure out the usage. I use the call hierarchy and the type hierarchy views a lot.
These are very general guidelines, and ultimately it is the time you spend with the project that will make you understand it.
Hadoop ecosystem is quickly growing and changes are introduced on monthly bases. that's why javadoc is not so good. Another reason is that hadoop software tends to lean towards the infrastructure and not towards the end user. People developing tools will spend time learning the APIs and internals while everybody else is kinda supposed to be blissfully ignorant of all those, and just use some high level domain specific language for the tool.

What Perl module to use?

How do I choose between two (or more) Perl modules, when each of them has the method(s) I need?
I'd check
if the API supports my use case.
If both APIs do, choose the module that is easier to work with.
If both APIs are equally easy to work with, choose the one with better documentation/support.
If both are equally well documented/supported, choose the that has less dependencies.
If both have the same amount of dependencies, take the smaller one.
If both are about equal, choose the one with more releases / more active development / more passes in the test matrix (courtesy of Eric Strom)
If both still are equal, roll a dice.
Your preferences may vary so you might want to reorder the list, depending on your preferences.
Well, it really depends on what you are going after, for instance:
if you are going after just that one particular function, then use the slimmer/smaller module.
Check also the date when it was added, and use the newer one, as it will adhere to more current standards.
Some modules also don't support OOP whilst other do, so that might be another factor to consider.
Also consider dependencies of each module, because there is no reason to install a bloated module for one or two functions.
Check authors notes for any bugs, etc.
And you can do research on your own, consult google and perlmonks because the chances are that someone already pondered dilemma between choosing those particular modules.

Suggestions for Adding Plugin Capability?

Is there a general procedure for programming extensibility capability into your code?
I am wondering what the general procedure is for adding extension-type capability to a system you are writing so that functionality can be extended through some kind of plugin API rather than having to modify the core code of a system.
Do such things tend to be dependent on the language the system was written in, or is there a general method for allowing for this?
I've used event-based APIs for plugins in the past. You can insert hooks for plugins by dispatching events and providing access to the application state.
For example, if you were writing a blogging application, you might want to raise an event just before a new post is saved to the database, and provide the post HTML to the plugin to alter as needed.
This is generally something that you'll have to expose yourself, so yes, it will be dependent on the language your system is written in (though often it's possible to write wrappers for other languages as well).
If, for example, you had a program written in C, for Windows, plugins would be written for your program as DLLs. At runtime, you would manually load these DLLs, and expose some interface to them. For example, the DLLs might expose a gimme_the_interface() function which could accept a structure filled with function pointers. These function pointers would allow the DLL to make calls, register callbacks, etc.
If you were in C++, you would use the DLL system, except you would probably pass an object pointer instead of a struct, and the object would implement an interface which provided functionality (accomplishing the same thing as the struct, but less ugly). For Java, you would load class files on-demand instead of DLLs, but the basic idea would be the same.
In all cases, you'll need to define a standard interface between your code and the plugins, so that you can initialize the plugins, and so the plugins can interact with you.
P.S. If you'd like to see a good example of a C++ plugin system, check out the foobar2000 SDK. I haven't used it in quite a while, but it used to be really well done. I assume it still is.
I'm tempted to point you to the Design Patterns book for this generic question :p
Seriously, I think the answer is no. You can't write extensible code by default, it will be both hard to write/extend and awfully inefficient (Mozilla started with the idea of being very extensible, used XPCOM everywhere, and now they realized it was a mistake and started to remove it where it doesn't make sense).
what makes sense to do is to identify the pieces of your system that can be meaningfully extended and support a proper API for these cases (e.g. language support plug-ins in an editor). You'd use the relevant patterns, but the specific implementation depends on your platform/language choice.
IMO, it also helps to use a dynamic language - makes it possible to tweak the core code at run time (when absolutely necessary). I appreciated that Mozilla's extensibility works that way when writing Firefox extensions.
I think there are two aspects to your question:
The design of the system to be extendable (the design patterns, inversion of control and other architectural aspects) (http://www.martinfowler.com/articles/injection.html). And, at least to me, yes these patterns/techniques are platform/language independent and can be seen as a "general procedure".
Now, their implementation is language and platform dependend (for example in C/C++ you have the dynamic library stuff, etc.)
Several 'frameworks' have been developed to give you a programming environment that provides you pluggability/extensibility but as some other people mention, don't get too crazy making everything pluggable.
In the Java world a good specification to look is OSGi (http://en.wikipedia.org/wiki/OSGi) with several implementations the best one IMHO being Equinox (http://www.eclipse.org/equinox/)
Find out what minimum requrements you want to put on a plugin writer. Then make one or more Interfaces that the writer must implement for your code to know when and where to execute the code.
Make an API the writer can use to access some of the functionality in your code.
You could also make a base class the writer must inherit. This will make wiring up the API easier. Then use some kind of reflection to scan a directory, and load the classes you find that matches your requirements.
Some people also make a scripting language for their system, or implements an interpreter for a subset of an existing language. This is also a possible route to go.
Bottom line is: When you get the code to load, only your imagination should be able to stop you.
Good luck.
If you are using a compiled language such as C or C++, it may be a good idea to look at plugin support via scripting languages. Both Python and Lua are excellent languages that are used to script a large number of applications (Civ4 and blender use Python, Supreme Commander uses Lua, etc).
If you are using C++, check out the boost python library. Otherwise, python ships with headers that can be used in C, and does a fairly good job documenting the C/python API. The documentation seemed less complete for Lua, but I may not have been looking hard enough. Either way, you can offer a fairly solid scripting platform without a terrible amount of work. It still isn't trivial, but it provides you with a very good base to work from.