How can I compare serverless with monolithic projects? - github

I need to develop a final project in college and I chose this topic as a goal. I would like to compare the impact of using serverless on the development of an application.
To do this, I thought I'd compare repositories that use monolithic and those that use serverless. At first, the idea would be to use only Python or JavaScript languages.
I would like to know if any of you have any suggestions for software to calculate software metrics or to make it easier to find these types of repositories on GitHub. Currently, to find serverless projects, I'm looking for repositories that contain some serverless.yml file.
The idea would be to make a comparative study between these two types of architecture, calculating the differences and the benefits of using each one of them. For example, how to split code into atomic parts can impact code complexity as well as maintainability over time.
I'm still a little lost on how to proceed, any ideas or suggestions would be most welcome!

Related

Is it better to have 2 simple versions of a software or 1 complicated version?

This is a simple question. Is it better to have 2 versions of code or one? One would be an engineering version that is not V&V tested with lots of functionality the other being a stripped-down customer version that is highly regulated and highly tested.
I have a simple problem, my code base is ballooning as new engineering features are demanded. However, the core customer code is extremely simple. The engineering portions sometimes cause unexpected bugs in the customer code. Also, every engineering bug has to be debugged and treated the same as a customer one which gives longer lead times for releases.
I see it as good and bad to split. First, it would allow engineering functions to be quickly made and released with no need to V&V test the software. The second advantage would be fewer and higher quality releases of customer-facing software. However, it would mean 2(smaller) codebases to maintain.
What is the common solution to this problem? Is there limit to which you decide it is time to break into two versions? Or should I just learn even more in-depth organizational techniques? I already try my best to follow the best organizational tips for the software and follow OOP best practices.
As of now, I would say my code base is about 50% customer software and 50% engineering functionality. However, I just got a new (large) engineering/manufacturing project to add to the software.
Any experience would be appreciated.
TL;DR: Split it.
You can use either of the models that you mention. It's about how you see it grow in the future and how it is presently set up.
If you decide to go the one large code base route, I would suggest that you use the MVC Architecture as it effectively sorts one large code base into smaller more manageable parts. This allows for a layer of abstraction for the engineers where they can easily narrow down a issue to it's own team. This allows for one team to not have to worry about how the other implemented a feature; is: the Graphic team does not need to worry about how the data is stored on your servers.
If you decide to go with the multiple code bases, you add an additional layer of "difficulty". You now have to maintain "twice" as much code. You will also need to make sure that the two sources play nice with each other. The benefit of doing it this way would be that you completely isolate the engineering version from your production version.
In your case, since you mention that the split is currently 50/50 between the 2 versions, I would recommend that you split the customer and the engineering version. This isolates the engineering code that is not tested form the customer code which is properly tested. This does give you "more" code to control but this issue can easily be mitigated by having 2 teams, each responsible for their own version.

Understanding Hadoop Packages and Classes

I have been using CDH and HDP for a while (both in the pseudo-distributed mode) on a VM as well as installing natively on Ubuntu. Although my question is probably relevant to all Projects within the Apache Hadoop Ecosystem, let me ask this specifically in the context of Avro.
What is the best way to go about figuring out what the different packages and the classes within the packages do. I usually end up referring to the Javadoc for the project (Avro in this case) but the overviews for packages and classes end up being awfully inadequate.
For e.g. Take two of the Avro packages: org.apache.avro.specific and org.apache.avro.generic These are used for creating Specific and Generic Readers and Writers (respectively) but I'm not a 100% sure what these are for. I have used the Specific Package for in cases when I have used Avro Code Generation and the Generic ones when I don't want to use code generation. However, I am not sure if that is the only reason for using one vs. the other.
Another example: The Encoder\Decoder Classes are used for low-level SerDe, the DatumReader\DatumWrite for a "medium-level" Serde while most application layer interactions with Avro will probably use Generic\Specific Readers\Writers. Without having struggled through the pain of using these classes, how is a user to know what to use for what?
Is there a better way to get a good overview of each package (clearly the javadoc is not well documented) and the classes within the package?
PS: I have similar questions for essentially all other Hadoop Projects (Hive, HBASE etc.) - the Javadocs seem to be grossly inadequate overall. I just wonder what other developers end up doing to figure these out.
Any inputs would be great.
I download the source code and skim through it to get the idea what it does. If there is javadoc, I read that too. I tend to concentrate on the interfaces that I need and move on from there, that way I put everything into context and it makes it easier to figure out the usage. I use the call hierarchy and the type hierarchy views a lot.
These are very general guidelines, and ultimately it is the time you spend with the project that will make you understand it.
Hadoop ecosystem is quickly growing and changes are introduced on monthly bases. that's why javadoc is not so good. Another reason is that hadoop software tends to lean towards the infrastructure and not towards the end user. People developing tools will spend time learning the APIs and internals while everybody else is kinda supposed to be blissfully ignorant of all those, and just use some high level domain specific language for the tool.

What is the most mature library for building a Data Analytics Pipeline in Java/Scala for Hadoop?

I found many options recently, and interesting in their comparisons primarely by maturity and stability.
Crunch - https://github.com/cloudera/crunch
Scrunch - https://github.com/cloudera/crunch/tree/master/scrunch
Cascading - http://www.cascading.org/
Scalding https://github.com/twitter/scalding
FlumeJava
Scoobi - https://github.com/NICTA/scoobi/
As I'm a developer of Scoobi, don't expect an unbiased answer.
First of all, FlumeJava is an internal google project that provides a (awesomely productive) abstraction ontop of MapReduce (not hadoop though). They released a paper about it, which is what projects like Scoobi and Crunch are based on.
If your only criteria is the maturity -- I guess Cascading is your best bet.
However, if you're looking for the (imho superior) FlumeJava style abstraction, you'll want to pick between (S)crunch and Scoobi.
The biggest difference, superficial as it may be is that crunch is written in Java, with Scala bindings (Scrunch). And Scoobi is written in Scala with Java bindings (scoobij). They're both really solid choices, and you won't go wrong which ever you choose. I'm sure there's quite a similar story with Crunch, but Scoobi is being used in real projects and is under continual development. We're pretty very active in fixing bugs and implementing features.
Anyway, they're both great projects with great people behind them and were both released within days of each other. They provide the same abstraction (with similiar api), so switching between the two won't be an issue in the slightest. My recommendation is to give them both a try, and see what works for you. There' no lock in in either project, so you don't need to commit :)
And if you have any feedback for either project, please be sure to provide it :)
I'm a big Scoobi fan myself and I've used it in production. I like the way it allows you to write type-safe Hadoop programs in a very idiomatic Scala way. If that is not necessarily your thing and you like the Cascading model but are scared off by the huge amount of boilerplate code you'd have to write, Twitter has recently open sourced its own Scala abstraction layer on top of Cascading called Scalding.
Announcement: https://dev.twitter.com/blog/scalding
GitHub: https://github.com/twitter/scalding
I guess it's all a matter of taste at this point since feature-wise most of the frameworks are very close to one another.
Scalding also has the advantage of significant open source projects built atop it, such as Matrix API and Algebird.
Here are some examples:
http://sujitpal.blogspot.com/2012/08/scalding-for-impatient.html
Cascalog was released almost two years before Scalding, and arguably has more advanced features for building robust workflows:
https://github.com/nathanmarz/cascalog/wiki

Common Libraries at a Company

I've noticed in pretty much every company I've worked that they have a common library that is generally shared across a number of projects. More often than not this has been a single companyx-commons project that ends up as a dumping ground for common programs including:
Command Line Parsers
File Utilities
Framework Helpers
etc...
Some of these are well thought out and some duplicate functionality found in Apache commons-lang, commons-io etc..
What are the things you have in your common library and more importantly how do you structure the common libraries to make them easy to improve and incorporate across other projects?
In my experience, the single biggest factor in the success of a common library is user buy-in; users in this case being other developers; and culture of your workplace/team(s) will be a big factor.
Separate libraries (projects/assemblies if you're in .Net) for different application tiers is essential (e.g: there's obviously no point putting UI and data access code together).
Keep things as simple as possible; what you don't put in a common library is often at least as important as what you do. Users of the library won't want to have to think, so usage needs to be super easy.
The golden rule we stuck to was keeping individual functions focused on a single task - do one thing and do it well (or very very well); don't try and provide something that tries to take every possibility into account, the more reusable you think you're making it - the less likely it is to be used. Code Complete (the book) has some excellent content on common libraries.
A good approach to setting/improving a library up is to do regular code reviews and retrospectives; find good candidates that you've already come up with and consider re-factoring them into a library for future projects; a good candidate will be something that more than one developer has had to do on more that one project (for example).
Set-up some sort of simple and clear governance of the libraries - someone who can 'own' a specific library and ensure it's overal quality (such as a senior dev or team lead).
I have so far written most of the common libraries we use at our office.
We have certain button classes that are just slightly more useful to us than the standard buttons
A database management class that does some internal caching and can connect to ODBC, OLEDB, SQL, and Access databases without even the flip of a parameter
Some grid and list controls that are multi threaded so we can add large amounts of data to them without the program slowing and without having to write all the multithreading code every time there is a performance issue with a list box/combo box.
These classes make it easier for all of us to work on each other's code and know how exactly they work since we all use the exact same interfaces throughout our products.
As far as organization goes, all of the DLL's are stored along with their source code on a shared development drive in the office that we all have access to. (We're a pretty small shop)
We split our libraries by function.
Commmon.Ui.dll has base classes for ui elements.
Common.Data.Dll is sort of a wrapper around Enterprise library Data access classes.
Common.Business is a dumping ground for other common classes that don't fit into one of those.
We create other specialized dlls as needs arise.

How do I plan an enterprise level web application?

I'm at a point in my freelance career where I've developed several web applications for small to medium sized businesses that support things such as project management, booking/reservations, and email management.
I like the work but find that eventually my applications get to a point where the overhear for maintenance is very high. I look back at code I wrote 6 months ago and find I have to spend a while just relearning how I originally coded it before I can make a fix or feature additions. I do try to practice using frameworks (I've used Zend Framework before, and am considering Django for my next project)
What techniques or strategies do you use to plan out an application that is capable of handling a lot of users without breaking and still keeping the code clean enough to maintain easily?
If anyone has any books or articles they could recommend, that would be greatly appreciated as well.
Although there are certainly good articles on that topic, none of them is a substitute of real-world experience.
Maintainability is nothing you can plan straight ahead, except on very small projects. It is something you need to take care of during the whole project. In fact, creating loads of classes and infrastructure code in advance can produce code which is even harder to understand than naive spaghetti code.
So my advise is to clean up your existing projects, by continuously refactoring them. Look at the parts which were a pain to change, and strive for simpler solutions that are easier to understand and to adjust. If the code is even too bad for that, consider rewriting it from scratch.
Don't start new projects and expect them to succeed, just because your read some more articles or used a new framework. Instead, identify the failures of your existing projects and fix their specific problems. Whenever you need to change your code, ask yourself how to restructure it to support similar changes in the future. This is what you need to do anyway, because there will be similar changes in the future.
By doing those refactorings you'll stumble across various specific questions you can ask and read articles about. That way you'll learn more than by just asking general questions and reading general articles about maintenance and frameworks.
Start cleaning up your code today. Don't defer it to your future projects.
(The same is true for documentation. Everyone's first docs were very bad. After several months they turn out to be too verbose and filled with unimportant stuff. So complement the documentation with solutions to the problems you really had, because chances are good that next year you'll be confronted with a similar problem. Those experiences will improve your writing style more than any "how to write good" style guide.)
I'd honestly recommend looking at Martin Fowlers Patterns of Enterprise Application Architecture. It discusses a lot of ways to make your application more organized and maintainable. In addition, I would recommend using unit testing to give you better comprehension of your code. Kent Beck's book on Test Driven Development is a great resource for learning how to address change to your code through unit tests.
To improve the maintainability you could:
If you are the sole developer then adopt a coding style and stick to it. That will give you confidence later when navigating through your own code about things you could have possibly done and the things that you absolutely wouldn't. Being confident where to look and what to look for and what not to look for will save you a lot of time.
Always take time to bring documentation up to date. Include the task into development plan; include that time into the plan as part any of change or new feature.
Keep documentation balanced: some high level diagrams, meaningful comments. Best comments tell that cannot be read from the code itself. Like business reasons or "whys" behind certain chunks of code.
Include into the plan the effort to keep code structure, folder names, namespaces, object, variable and routine names up to date and reflective of what they actually do. This will go a long way in improving maintainability. Always call a spade "spade". Avoid large chunks of code, structure it by means available within your language of choice, give chunks meaningful names.
Low coupling and high coherency. Make sure you up to date with techniques of achieving these: design by contract, dependency injection, aspects, design patterns etc.
From task management point of view you should estimate more time and charge higher rate for non-continuous pieces of work. Do not hesitate to make customer aware that you need extra time to do small non-continuous changes spread over time as opposed to bigger continuous projects and ongoing maintenance since the administration and analysis overhead is greater (you need to manage and analyse each change including impact on the existing system separately). One benefit your customer is going to get is greater life expectancy of the system. The other is accurate documentation that will preserve their option to seek someone else's help should they decide to do so. Both protect customer investment and are strong selling points.
Use source control if you don't do that already
Keep a detailed log of everything done for the customer plus any important communication (a simple computer or paper based CMS). Refresh your memory before each assignment.
Keep a log of issues left open, ideas, suggestions per customer; again refresh your memory before beginning an assignment.
Plan ahead how the post-implementation support is going to be conducted, discuss with the customer. Make your systems are easy to maintain. Plan for parameterisation, monitoring tools, in-build sanity checks. Sell post-implementation support to customer as part of the initial contract.
Expand by hiring, even if you need someone just to provide that post-implementation support, do the admin bits.
Recommended reading:
"Code Complete" by Steve Mcconnell
Anything on design patterns are included into the list of recommended reading.
The most important advice I can give having helped grow an old web application into an extremely high available, high demand web application is to encapsulate everything. - in particular
Use good MVC principles and frameworks to separate your view layer from your business logic and data model.
Use a robust persistance layer to not couple your business logic to your data model
Plan for statelessness and asynchronous behaviour.
Here is an excellent article on how eBay tackles these problems
http://www.infoq.com/articles/ebay-scalability-best-practices
Use a framework / MVC system. The more organised and centralized your code is the better.
Try using Memcache. PHP has a built in extension for it, it takes about ten minutes to set up and another twenty to put in your application. You can cache whatever you want to it - I cache all my database records in it - for every application. It does wanders.
I would recommend using a source control system such as Subversion if you aren't already.
You should consider maybe using SharePoint. It's an environment that is already designed to do all you have mentioned, and has many other features you maybe haven't thought about (but maybe you will need in the future :-) )
Here's some information from the official site.
There are 2 different SharePoint environments you can use: Windows Sharepoint Services (WSS) or Microsoft Office Sharepoint Server (MOSS). WSS is free and ships with Windows Server 2003, while MOSS isn't free, but has much more features and covers almost all you enterprise's needs.