Can one prove that a website/service uses a specific open source codebase? - deployment

Suppose there is a deployed, public website/web service. They say it is an instance of an open source codebase. Can they prove it to me?
For example, take the useful www.s3auth.com, an HTTP auth gateway for S3 that lets you password protect static sites hosted on S3 (thanks to #yegor256). Yegor graciously open-sourced the service to assuage privacy concerns:
I made this software open source mostly to guarantee to my users that the server doesn't store their private data anywhere, but rather acts only as a pass-through service. As a result, the software is on GitHub.
Can he somehow prove to users that www.s3auth.com actually uses the exact codebase on Github?

Short answer
In the general case: No, without a complete and external audit of the software, hardware and network infrastructure during a specific period of time
Explanation
Ensure that a sofware is according an specification it is not at all a trivial task. Note that even the requirement actually uses the exact codebase on Github is not clear:
- Includes any part of the repository
- Includes a full tag
- Includes a full tag at a point of time
- Includes a full tag at a point of time and uses some functionality
- Includes a full tag at a point in time and uses a significant part of the functionality
- Includes a full tag at a point in time, uses a significant portion of the functionality, and there is no additional function that substantially modifies the behavior
- Includes a full tag at a point of time, uses a significant part of the functionality, and there is no additional function
- etc
An auditor should check:
the code to ensure that the requirement is acomplished
the build process to verify that the deliverable is the expected and does not include or remove parts
The hardware infraestructure to ensure the software is deployed, not altered an used as is
The network infraestructure to verify the deployed version is really the same that the users are getting
Any change on the code or infraestructure will invalidate the audit results
The auditor should be external to ensure independence and should in turn be audited by a regulatory body that certifies that it is capable of performing the process
I think I am beating around the bush. .I want to illustrate that assert that a software meets a specification it is difficult, expensive, and sometimes not useful. Of course, this process it is not needed if both parties agree on something simpler
This kind of audits exists in the real world, especially in the world of security. For example FIPS 140 or Common Criteria evaluations certifies that Hardware Security Modules or even software packages meets some security requirements. I know also some trusted certifiers that prove that a site shows a specific content at a point of time (usually applied to e-goverment)

Related

REST - Should an API client "advance" to the "next" resource like a browser?

In my years specifying and designing REST APIs, I'm increasingly finding that its very similar to designing a website where the user's journey and the actions and links are story-boarded and critical to the UX.
With my API designs currently, I return links in items and at the bottom of resources. They perform actions, mutate state or bring back other resources.
But its as if each link opens in a new tab; the client explores down a new route and their next options may narrow as they go.
If this were a website, it wouldn't necessarily be a good design. The user would have to either open links in new tabs or back-up the stack all the time to get things done.
Good sites are forward only, or indeed have a way to indicate a branch off the main flow, i.e. links automatically opening in new windows (via anchor tag target).
So should a good REST API be designed as if the client discards the current resource and advances to the next and is always advancing forward?
Or do we assume the client is building a map as it goes, like um a Roomba exploring our living room?
The thing with the map concept is that the knowledge that one should return to a previous resource, of the many it might know about, is in a sentient human, a guess. Computers are incapable of guessing and so its need programming, and this implies out-of-band static documentation and breaks REST.
In my years specifying and designing REST APIs, I'm increasingly finding that its very similar to designing a website
Yes - a good REST API looks a lot like a machine readable web site.
So should a good REST API be designed as if the client discards the current resource and advances to the next and is always advancing forward?
Sort of - the client is permitted to cache representations; so if you present a link, the client may "follow" the link to the cached representation rather than using the server.
That also means that the client may, at its discretion, "hit the back button" to go off and do something else (for example, if the link that it was hoping to find isn't present, it might try to achieve its goal another way). This is part of the motivation for the "stateless" constraint; the server doesn't have to pretend to know the client's currently displayed page to interpret a message.
Computers are incapable of guessing and so its need programming, and this implies out-of-band static documentation and breaks REST.
Fielding, writing in 2008
Of course the client has prior knowledge. Every protocol, every media type definition, every URI scheme, and every link relationship type constitutes prior knowledge that the client must know (or learn) in order to make use of that knowledge. REST doesn’t eliminate the need for a clue. What REST does is concentrate that need for prior knowledge into readily standardizable forms. That is the essential distinction between data-oriented and control-oriented integration.
I found this nugget in Fielding's original work.
https://www.ics.uci.edu/~fielding/pubs/dissertation/rest_arch_style.htm
The model application is therefore an engine that moves from one state to the next by examining and choosing from among the alternative state transitions in the current set of representations. Not surprisingly, this exactly matches the user interface of a hypermedia browser. However, the style does not assume that all applications are browsers. In fact, the application details are hidden from the server by the generic connector interface, and thus a user agent could equally be an automated robot performing information retrieval for an indexing service, a personal agent looking for data that matches certain criteria, or a maintenance spider busy patrolling the information for broken references or modified content [39].
It reads like a great REST application would be built to be forward only, like a great website should be simple to use even without a back button, including advancing to a previously-seen representation (home and search links always available).
Interestingly we tend to really think about user journeys in web design, and the term journey is a common part of our developer language, but in API design this hasn't yet permeated.

How to implement continuous migration for large website?

I am working on a website of 3,000+ pages that is updated on a daily basis. It's already built on an open source CMS. However, we cannot simply continue to apply hot fixes on a regular basis. We need to replace the entire system and I anticipate the need to replace the entire system on a 1-2 year basis. We don't have the staff to work on a replacement system while the other is being worked on, as it results in duplicate effort. We also cannot have a "code freeze" while we work on the new site.
So, this amounts to changing the tire while driving. Or fixing the wings while flying. Or all sorts of analogies.
This brings me to a concept called "continuous migration." I read this article here: https://www.acquia.com/blog/dont-wait-migrate-drupal-continuous-migration
The writer's suggestion is to use a CDN like Fastly. The idea is that a CDN allows you to switch between a legacy system and a new system on a URL basis. This idea, in theory, sounds like a great idea that would work. This article claims that you can do this with Varnish but Fastly makes the job easier. I don't work much with Varnish, so I can't really verify its claims.
I also don't know if this is a good idea or if there are better alternatives. I looked at Fastly's pricing scheme, and I simply cannot translate what it means to a specific price point. I don't understand these cryptic cloud-service pricing plans, they don't make sense to me. I don't know what kind of bandwidth the website uses. Another agency manages the website's servers.
Can someone help me understand whether or not using an online CDN would be better over using something like Varnish? Is there free or cheaper solutions? Can someone tell me what this amounts to, approximately, on a monthly or annual basis? Any other, better ways to roll out a new website on a phased basis for a large website?
Thanks!
I think I do not have the exact answers to your question but may be my answer helps a little bit.
I don't think that the CDN gives you an advantage. It is that you have more than one system.
Changes to the code
In professional environments I'm used to have three different CMS installations. The fist is the development system, usually on my PC. That system is used to develop the extensions, fix bugs and so on supported by unit-tests. The code is committed to a revision control system (like SVN, CVS or Git). A continuous integration system checks the commits to the RCS. When feature is implemented (or some bugs are fixed) a named tag will be created. Then this tagged version is installed on a test-system where developers, customers and users can test the implementation. After a successful test exactly this tagged version will be installed on the production system.
A first sight this looks time consuming. But it isn't because most of the steps can be automated. And the biggest advantage is that the customer can test the change on a test system. And it is very unlikely that an error occurs only on your production system. (A precondition is that your systems are build on a similar/equal environment. )
Changes to the content
If your code changes the way your content is processed it is an advantage when your
CMS has strong workflow support. Than you can easily add a step to your workflow
which desides if the content is old and has to be migrated for the current document.
This way you have a continuous migration of the content.
HTH
Varnish is a cache rather than a CDN. It intercepts page requests and delivers a cached version if one exists.
A CDN will serve up contents (images, JS, other resources etc) from an off-server location, typically in the cloud.
The cloud-based solutions pricing is often very cryptic as it's quite complicated technology.
I would be careful with continuous migration. I've done both methods in the past (continuous and full migrations) and I have to say, continuous is a pain. It means double the admin time for everything, and assumes your requirements are the same at all points in time.
Unfortunately, I would say you're better with a proper rebuilt on a 1-2 year basis than a continuous migration, but obviously you know best about that.
I would suggest you maybe also consider a hybrid approach? Build yourself an export tool to keep all of your content in a transferrable state like CSV/XML/JSON so you can just import into a new system when ready. This means you can incorporate new build requests when you need them in a new system (what's the point in a new system if it does exactly the same as the old one) and you get to keep all your content. Plus you don't need to build and maintain two CMS' all the time.

Read-access to SAP's DB directly?

We're an SME with SAP implemented. We're trying to use the transactional data in SAP to build another system in PHP for our trucking division for graphical reports, etc. This is because we don't have in-house expertise ABAP development and any SAP modifications are expensive.
Presently, I've managed to achieve our objectives with read-only access to our Quality DB2 server and any writes go to another DB2 server. We've found the CPU usage on the SELECT statements to be acceptable and the user is granted access only to specific tables/views.
SAP's Quality DB2 -> PHP -> Different DB2 client
Would like your opinion on whether it is safe to read from production the same way? Implementing all of this again via the RFC connector seems very painful. Master-Slave config is an option for us but again will involve external consultancy.
EDIT
Forgot to mention that our SAP guys don't want to build even reports for another 6-months - they want to leave the system intact. Which is why we're building this in PHP on the top.
If you don't have ABAP expertise, get it - it's not that hard, and you'll get a lot of stuff "for granted" (as in "provided by the platform") that you'll have to implement manually otherwise - like user authentication and authority management and software logistics (moving stuff from the development to the production repository). See these articles for a short (although biased) introduction. If you still need an external PHP application, fine - but you really should give ABAP a try first. For web applications, you might want to look into Web Dynpro ABAP. Using the IGS built'in chart engine with the BusinessGraphics element, you'll get a ton of the most custom chart types for free. You can also integrate PDF forms created with Adobe Livecycle Designer.
Second, while "any SAP modifications are expensive" might be a good approach, what you're suggesting isn't a modification. That's add-on development, and it's neither expensive nor more complex than any other programming language and/or environment out there. If you can't or don't want to implement your own application entirely using the existing infrastructure, at least use a decent interface - web services, RFC, whatever. From an ABAP point of view, RFC is always the easiest option, but you can use SOAP or REST as well, although you'll have to implement the latter manually. It's not that hard either.
NEVER EVER access the SAP database directly. Just don't. You'll have to implement all the constraints like client dependency or checks for validity dates and cancellation flags for yourself - that's hardly less complex than writing a decent interface, and it's prone to break every time the structure is changed. And if at some point you need to read some of the more complex contents like long texts, you're screwed - period. Not to mention that most internal or external auditors (if that happens to be an issue with your company and/or legal requirements) don't like direct database access to a system as critical as this one, which again can cause lots of trouble from people you really don't want to mess with. It's just not worth it.

how to maintain multiple components for multiple client for multiple features?

Basically my project is product based.
Once we developed a project and catch the multiple client and deploy the application based on their needs.
But We decided to put the new features and project dependent modules are as component.
Now my application got many number of customer.
Every customer needs a different features based on the component.
But we have centralized component for all client . we move the components additional feature to client specific folder and deploy.
My problem is , I am unable maintain the components features for multiple client.
My component feature code is increased and I am unable to track the client features.
Is there any solution for maintaining the multiple component features for multiple client ?
I've worked for a couple of companies in a similar space - product software but very heavily customised.
Essentially there is a decision the company needs to make - are you a product company (that is you ship broadly the same to every client) or are you a bespoke company. At the moment it sounds like they're between two stools and wanting the economies of being a product company with the ability to meet specific client demands the way a bespoke software company can.
Assuming the company wants to be a product software company, unless there are specific technical reasons why you can't, you need to move to a single code base with the modifications for each customer being handled through customisable options (i.e. flags saying how this particular situation is being handled, whether this feature is available and so on).
These can be set at run time (so they can be changed as the client wants - think options in Word or Excel), or build time (so code is included / excluded when you do the build), but the key things is that every client has to be pulled from the same code base.
But this needs to be agreed with the business as it limits what they can sell - every change they sell has to fit into an overall vision which can be accommodated by the single product.
The alternative is that you're essentially producing bespoke software for each client (that is coded specifically for what they want) but using many common libraries. That's fine and allows you to produce something which is exactly what they want but in the end it is going to be more work and the business needs to understand and cost for that.
We actually do a bit of both - there is a server product which is identical for all clients, and then web and mobile clients which are specific to them (in the case of mobile you can't have lots of dead code on the device - the web stuff is historic and will be moving to a standard product for all clients).
Good luck though, it's a difficult problem with no easy solution.
You are essentially talking about software product lines (SPLs): variations from a common base. Since you already package your features as components, you need a specialized tool to manage such variations.
You can then build a complete custom application based on a configuration that is unique to any given customer. Easier said than done, of course.
A model-driven software development(MDSD) approach can help a lot on this task. One such system that can support this development setup is ABSE, an emerging MDSD approach that among other things, can implement a software product line (info at http://www.abse.info - Disclaimer: I am the ABSE project lead). There is no product yet though. An alpha preview is coming.
Again, I know some companies that, using an MDSD coupled with code generation, have achieved what I understand you want: products that are half pre-packaged, half custom.

How to manage multiple clients with slightly different business rules? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 3 years ago.
Improve this question
We have written a software package for a particular niche industry. This package has been pretty successful, to the extent that we have signed up several different clients in the industry, who use us as a hosted solution provider, and many others are knocking on our doors. If we achieve the kind of success that we're aiming for, we will have literally hundreds of clients, each with their own web site hosted on our servers.
Trouble is, each client comes in with their own little customizations and tweaks that they need for their own local circumstances and conditions, often (but not always) based on local state or even county legislation or bureaucracy. So while probably 90-95% of the system is the same across all clients, we're going to have to build and support these little customizations.
Moreover, the system is still very much a work in progress. There are enhancements and bug fixes happening continually on the core system that need to be applied across all clients.
We are writing code in .NET (ASP, C#), MS-SQL 2005 is our DB server, and we're using SourceGear Vault as our source control system. I have worked with branching in Vault before, and it's great if you only need to keep 2 or 3 branches synchronized - but we're looking at maintaining hundreds of branches, which is just unthinkable.
My question is: How do you recommend we manage all this?
I expect answers will be addressing things like object architecture, web server architecture, source control management, developer teams etc. I have a few ideas of my own, but I have no real experience in managing something like this, and I'd really appreciate hearing from people who have done this sort of thing before.
Thanks!
I would recommend against maintaining separate code branches per customer. This is a nightmare to maintain working code against your Core.
I do recommend you do implement the Strategy Pattern and cover your "customer customizations" with automated tests (e.g. Unit & Functional) whenever you are changing your Core.
UPDATE:
I recommend that before you get too many customers, you need to establish a system of creating and updating each of their websites. How involved you get is going to be balanced by your current revenue stream of course, but you should have an end in mind.
For example, when you just signed up Customer X (hopefully all via the web), their website will be created in XX minutes and send the customer an email stating it's ready.
You definitely want to setup a Continuous Integration (CI) environment. TeamCity is a great tool, and free.
With this in place, you'll be able to check your updates in a staging environment and can then apply those patches across your production instances.
Bottom Line: Once you get over a handful of customers, you need to start thinking about automating your operations and your deployment as yet another application to itself.
UPDATE: This post highlights the negative effects of branching per customer.
Our software has very similar requirements and I've picked up a few things over the years.
First of all, such customizations will cost you both in the short and long-term. If you have control over it, place some checks and balances such that sales & marketing do not over-zealously sell customizations.
I agree with the other posters that say NOT to use source control to manage this. It should be built into the project architecture wherever possible. When I first began working for my current employer, source control was being used for this and it quickly became a nightmare.
We use a separate database for each client, mainly because for many of our clients, the law or the client themselves require it due to privacy concerns, etc...
I would say that the business logic differences have probably been the least difficult part of the experience for us (your mileage may vary depending on the nature of the customizations required). For us, most variations in business logic can be broken down into a set of configuration values which we store in an xml file that is modified upon deployment (if machine specific) or stored in a client-specific folder and kept in source control (explained below). The business logic obtains these values at runtime and adjusts its execution appropriately. You can use this in concert with various strategy and factory patterns as well -- config fields can contain names of strategies etc... . Also, unit testing can be used to verify that you haven't broken things for other clients when you make changes. Currently, adding most new clients to the system involves simply mixing/matching the appropriate config values (as far as business logic is concerned).
More of a problem for us is managing the content of the site itself including the pages/style sheets/text strings/images, all of which our clients often want customized. The current approach that I've taken for this is to create a folder tree for each client that mirrors the main site - this tree is rooted at a folder named "custom" that is located in the main site folder and deployed with the site. Content placed in the client-specific set of folders either overrides or merges with the default content (depending on file type). At runtime the correct file is chosen based on the current context (user, language, etc...). The site can be made to serve multiple clients this way. Efficiency may also be a concern - you can use caching, etc... to make it faster (I use a custom VirtualPathProvider). The largest problem we run into is the burden of visually testing all of these pages when we need to make changes. Basically, to be 100% sure you haven't broken something in a client's custom setup when you have changed a shared stylesheet, image, etc... you would have to visually inspect every single page after any significant design change. I've developed some "feel" over time as to what changes can be comfortably made without breaking things, but it's still not a foolproof system by any means.
In my case I also have no control other than offering my opinion over which visual/code customizations are sold so MANY more of them than I would like have been sold and implemented.
This is not something that you want to solve with source control management, but within the architecture of your application.
I would come up with some sort of plugin like architecture. Which plugins to use for which website would then become a configuration issue and not a source control issue.
This allows you to use branches, etc. for the stuff that they are intended for: parallel development of code between (or maybe even over) releases. Each plugin becomes a seperate project (or subproject) within your source code system. This also allows you to combine all plugins and your main application into one visual studio solution to help with dependency analisys etc.
Loosely coupling the various components in your application is the best way to go.
As mention before, source control does not sound like a good solution for your problem. To me it sounds that is better yo have a single code base using a multi-tenant architecture. This way you get a lot of benefits in terms of managing your application, load on the service, scalability, etc.
Our product using this approach and what we have is some (a lot) of core functionality that is the same for all clients, custom modules that are used by one or more clients and at the core a the "customization" is a simple workflow engine that uses different workflows for different clients, so each clients gets the core functionality, its own workflow(s) and some extended set of modules that are either client specific or generalized for more that one client.
Here's something to get you started on multi-tenancy architecture:
Multi-Tenant Data Architecture
SaaS database tenancy patterns
Without more info, such as types of client specific customization, one can only guess how deep or superficial the changes are. Some simple/standard approaches to consider:
If you can keep a central config specifying the uniqueness from client to client
If you can centralize the business rules to one class or group of classes
If you can store the business rules in the database and pull out based on client
If the business rules can all be DB/SQL based (each client having their own DB
Overall hard coding differences based on client name/id is very problematic, keeping different code bases per client is costly (think of the complete testing/retesting time required for the 90% that doesn't change)...I think more info is required to properly answer (give some specifics)
Layer the application. One of those layers contains customizations and should be able to be pulled out at any time without affect on the rest of the system. Application- and DB-level "triggers" (quoted because they may or many not employ actual DB triggers) that call customer-specific code or are parametrized with customer keys) are very helpful.
Core should never be customized, but you must layer it in somewhere, even if it is simplistic web filtering.
What we have is a a core datbase that has the functionality that all clients get. Then each client has a separate database that contains the customizations for that client. This is expensive in terms of maintenance. The other problem is that when two clients ask for a simliar functionality, it is often done differnetly by the two separate teams. There is currently little done to share custiomizations between clients and make common ones become part of the core application. Each client has their own application portal, so we don't have the worry about a change to one client affecting some other client.
Right now we are looking at changing to a process using a rules engine, but there is some concern that the perfomance won't be there for the number of records we need to be able to process. However, in your circumstances, this might be a viable alternative.
I've used some applications that offered the following customizations:
Web pages were configurable - we could drag fields out of view, position them where we wanted with our own name for the field label.
Add our own views or stored procedures and use them in: data grids (along with an update proc) and reports. Each client would need their own database.
Custom mapping of Excel files to import data into system.
Add our own calculated fields.
Ability to run custom scripts on forms during various events.
Identify our own custom fields.
If you clients are larger companies, you're almost going to need your own SDK, API's, etc.