How to determine the minimum JRE version and system requirements for my Java application - eclipse

I have written an application in Java using Eclipse IDE and I now need to know the minimum JRE version that is required to run the application! I know that certain methods are only available under later JREs, but I was wondering what the easiest way to find out the highest requirement of my application would be, so any suggestions would be appreciated...
Also whilst I am on the topic of requirements, I would appreciate any advice or methods for determining the minimum system requirements for my software in general - i.e minimum amount of RAM...
Thanks in advance

Method 1: For minimum JRE version, that's going to be tough. The easiest way is to simply require the same version that you're building against, or later, e.g. JRE 6.x.x or higher.
Method 2: Install multiple JDK's, making them available in Eclipse, and just change the version you're building against, running your app's test suite each time, and making sure they all pass. The earliest version of the JDK that allows all your tests to pass is the lowest JRE it can run against. Simply having your app successfully compile isn't enough, because previous versions of the JRE/JDK might have bugs that allow for successful compilation, but don't allow for proper program execution.
Method 3: Always require the latest on the client side, because Oracle is constantly patching security holes, and ultimately, it may be best to require the latest versions, if you have that kind of control, on the client side.
As far as RAM, that's easy. When the JVM starts it sets a 'maximum' amount of RAM (I believe the default may be 128MB), and that's a hard limit that your application cannot exceed without crashing. Profile your app over time, tweaking the memory settings on the JVM, and find out what the minimum amount of RAM is that you'll need for your app to run both (a) with acceptable performance, and (b) without throwing an OutOfMemoryError, and you're done.
Ref: How to configure JVM options and memory?
For other requirements such as CPU req., things get a little fuzzier. There are a lot of CPUs out there, and the throughput that a given system produces can vary not just based on CPU speed, but the speed of the hard drive, the amount of RAM installed in the system, the speed of the network interface (if you're writing a network app), and other things. For requirements such as that, you'll want to just test it on a variety of systems and sort of draw a line somewhere, and say, "You can expect acceptable performance if you have hardware that is at least as powerful as X, Y, Z".
The other thing you could do is build in a benchmark, or some kind of performance logging, and have that performance data sent back to you. Lots of apps do this. You know that "May we send anonymous usage data back to the mothership?" question you get when installing some software? Well, common among that data are system-specific details such as RAM, CPU, hard drive model, and other hardware details (whatever data you determine is relevant to your app), along with performance logging data. By taking that kind of approach, what you get is a lot of performance data from lots of different system configurations without needing to have a huge number of differently configured machines in-house.
You can do the same thing for program crashes and bugs - have the stack traces, system info, and other relevant data dumped to a log file that is sent back to you - but of course, only if your users have said it's okay to send that data back to you.

Related

How does my operating system get information about disk size, RAM size, CPU frequency, etc

I can see from my OS the informations about my hard disk, RAM and CPU. But I've never told my OS these info.
How does my OS know it?
Is there some place in the hard disk or CPU or RAM that stores this kind of information?
Is there some standard about the format of this kind of information?
SMBIOS (formerly known as DMI) contains much of this information. SMBIOS is a a data structure/API that is part of the BIOS/UEFI firmware and contains info like brand and model of the computer, etc.
The rest is gathered by the OS querying hardware directly.
Answer grabbed from superuser by Mokubai.
You don't need to tell it because each device already knows (or has a way) to identify itself.
If you get the idea that every device is accessed via address and data lines, and in some cases only data lines then you come to the relaisation that in those data lines you need some kind of "protocol" that determines just how you talk to those devices.
In amongst that protocol you have commands that say "read this" and "send that" or "put this over there". It is also relatively easy to have a command that says "identify yourself" which, rather than reading a block of disk or memory or painting a pixel a particular colour, will return a premade string or set of strings that tell the driver or operating system what that device is. Using a series of identity commands you could discover a device type, it's capabilities and what driver might be able to work with it.
You don't need to tell a device what it is, because it already knows. And you don't need to tell the operating system what it is because it can ask the device itself.
You don't tell people what they're called and how they talk, you ask them.
Each device has it's own protocol for these messages, and they don't store the details of other devices because to do so would be insane and near useless given that you can remove any device at any time. Your hard drive doesn't need to store information about your memory or graphics card except for the driver that the operating system uses to talk to it with.
The PC UEFI specification would define a core set of system specifications that every computer has, allowing the processor to be powered up and for a program stored in an EEPROM to begin the asbolute basic system probing necessary to determine the processor, set up the RAM, find a disk and display and thus continue to boot the computer.
From there the UEFI system would hand over to the operating system which would have more detailed probing and identification procedures, but it all starts at the most basic "I have a processor, what is around me?" situation.

What are the limitations of the flask built-in web server

I'm a newbie in web server administration. I've read multiple times that flask built-in web server is not designed for "production", and must be used only for tests and debug...
But what if my app touchs only a thousand users who occasionnaly send data to the server ?
If it works, when will I have to bother with the configuration of a more sophisticated web server ? (I am looking for approximative metrics).
In a nutshell, I would love to find what the builtin web server can do (with approx thresholds) and what it cannot.
Thanks a lot !
There isn't one right answer to this question, but here are some things to keep in mind:
With the right amount of horizontal scaling, it is quite possible you could keep scaling out use of the debug server forever. When exactly you would need to start scaling (or switch to using a "real" web server) would also depend on the environment you are hosting in, the expectations of the users, etc.
The main issue you would probably run into is that the server is single-threaded. This means that it will handle each request one at a time, serially. This means that if you are trying to serve more than one request (including favicons, static items like images, CSS and Javascript files, etc.) the requests will take longer. If any given requests happens to take a long time (say, 20 seconds) then your entire application is unresponsive for that time (20 seconds). This is only the default, of course: you could bump the thread counts (or have requests be handled in other processes), which might alleviate some issues. But once again, it can still be slow under a "high" load. What is considered a "high" load will be dependent on your application and the expectations of a maximum acceptable response time.
Another issue is security: if you are concerned at ALL about security (and not just the security of the data in the application itself, but the security of the box that will be running it as well) then you should not use the development server. It is not ready to withstand any sort of attack.
Finally, the development server could just fail outright. It is not designed to be used as a long-running process (days, weeks, months), and so it has not been well tested to work in this capacity.
So, yes, it has limitations. Yes, you could still conceivably use it in production. And yes, I would still recommend using a "real" web server. If you don't like the idea of needing to install something like Apache or Nginx, you can still go with a solution that is still as easy as "run a python script" by using some of the WSGI Standalone servers, which can run a server that is designed to be in production with something just as simple as running python run_app.py in the command line. You typically just need to create a 4-5 line python script to import and create the server object, point it to your Flask app, and run it.
gunicorn could be run with only the following on the command line, no extra script needed:
gunicorn myproject:app
...where "myproject" is the Python package that contains the app Flask object. Keep in mind that one of developers of gunicorn would probably recommend against this approach. See https://serverfault.com/questions/331256/why-do-i-need-nginx-and-something-like-gunicorn.
The OP has long-since moved on, but for those who encounter this question in the future I would just add that setting up an Apache server, even on a laptop, is free and pretty easy. It can be readily configured for as few or as many features as you want just by uncomment in or commenting out lines in the config file. There might be an even easier GUI method for doing that nowdays, but just editing the configs is simple.

How should I benchmark a system to determine the overall best architecture choice?

This is a bit of an open ended question, but I'm looking for an open ended answer. I'm looking for a resource that can help explain how to benchmark different systems, but more importantly how to analyze the data and make intelligent choices based on the results.
In my specific case, I have a 4 server setup that includes mongo that serves as the backend for an iOS game. All servers are running Ubuntu 11.10. I've read numerous articles that make suggestions like "if CPU utilization is high, make this change." As a new-comer to backend architecture, I have no concept of what "high CPU utilization" is.
I am using Mongo's monitoring service (MMS), and I am gathering some information about it, but I don't know how to make choices or identify bottlenecks. Other servers serve requests from the game client to mongo and back, but I'm not quite sure how I should be benchmarking or logging important information from them. I'm also using Amazon's EC2 to host all of my instances, which also provides some information.
So, some questions:
What statistics are important to log on a backend setup? (CPU, RAM, etc)
What is a good way to monitor those statistics?
How do I analyze the statistics? (RAM usage is high/read requests are low, etc)
What tips should I know before trying to create a stress-test or benchmarking script for my architecture?
Again, if there is a resource that answers many of these questions, I don't need an explanation here, I was just unable to find one on my own.
If more details regarding my setup are helpful, I can provide those as well.
Thanks!
I like to think of performance testing as a mini-project that is undertaken because there is a real-world need. Start with the problem to be solved: is the concern that users will have a poor gaming experience if the response time is too slow? Or is the concern that too much money will be spent on unnecessary server hardware?
In short, what is driving the need for the performance testing? This exercise is sometimes called "establishing the problem to be solved." It is about the goal to be achieved-- because if there is not goal, why go through all the work of testing the performance? Establishing the problem to be solved will eventually drive what to measure and how to measure it.
After the problem is established, a next set is to write down what questions have to be answered to know when the goal is met. For example, if the goal is to ensure the response times are low enough to provide a good gaming experience, some questions that come to mind are:
What is the maximum response time before the gaming experience becomes unacceptably bad?
What is the maximum response time that is indistinguishable from zero? That is, if 200 ms response time feels the same to a user as a 1 ms response time, then the lower bound for response time is 200 ms.
What client hardware must be considered? For example, if the game only runs on iOS 5 devices, then testing an original iPhone is not necessary because the original iPhone cannot run iOS 5.
These are just a few question I came up with as examples. A full, thoughtful list might look a lot different.
After writing down the questions, the next step is decide what metrics will provide answers to the questions. You have probably comes across a lot metrics already: response time, transaction per second, RAM usage, CPU utilization, and so on.
After choosing some appropriate metrics, write some test scenarios. These are the plain English descriptions of the tests. For example, a test scenario might involve simulating a certain number of games simultaneously with specific devices or specific versions of iOS for a particular combination of game settings on a particular level of the game.
Once the scenarios are written, consider writing the test scripts for whatever tool is simulating the server work loads. Then run the scripts to establish a baseline for the selected metrics.
After a baseline is established, change parameters and chart the results. For example, if one of the selected metrics is CPU utilization versus the number of of TCP packets entering the server second, make a graph to find out how utilization changes as packets/second goes from 0 to 10,000.
In general, observe what happens to performance as the independent variables of the experiment are adjusted. Use this hard data to answer the questions created earlier in the process.
I did a Google search on "software performance testing methodology" and found a couple of good links:
Check out this white paper Performance Testing Methodology by Johann du Plessis
Have a look at the Methodology section of this Wikipedia article.

How is Accurev Performance?

How is performance in the current version (4.7) of Accurev?
time to checkout per 100mb, per gb?
time to commit per # of files or mb?
responsiveness of gui when 100+ streams?
I just had a demo of Accurev, and the streams look like a lightweight way to model workflow around code/projects. I've heard people praising Accurev for the streams back end and complaining about performance. Accurev appears to have worked on the performance, but I'd like to get some real world data to make sure it isn't a case of demos-well-runs-less-well.
Does anyone have Accurev performance anecdotes or (even better) data from testing?
I don't have any numbers but I can tell you where we have noticed performance issues.
Our builds typically use 30-40K files from source control. In my workspace currently there are over 66K files including build intermediate and output files, over 15GB in size. To keep AccuRev working responsively we aggressively use the ignore elements so AccuRev ignores any intermediate files such as *.obj. In addition we use the time stamp optimization. In general running an update is quick, but the project sizes are typically 5-10 people so normally only a couple of dozen files come down if you update daily. Even if someone made changes that touched lots of files speed is not an issue. On the other hand a full populate of all 30K+ files is slow. I don't have a time since I seldom do this and on the rare occasion I do, I run the populate when I'm going to lunch or a meeting. I expect it could be as much as 10 minutes. In general source files come down very quickly, but we have some large binary files, 10-20MB, that take a couple of seconds each.
If the exclude rules and ignore elements are not correctly configured, AccuRev can take a couple of minutes to run an update for workspaces of this size. When I hear of other developers complaining about the speed I know something is miss-configured and we get it straightened out.
A year or so ago one of the project updated boost with 25K+ files and also added FireFox to the repository (forget the size but made boost look small.) They also added ICU, wrote a lot of software and modified countless files. In all I recall there were approx 250K+ files sitting in a stream. I unfortunately decided that all their good code should be promoted to the root so all projects could share. This turned out to be a little beyond what AccuRev could handle well. It was a multi hour process getting all the changes promoted. As I recall once FireFox was promoted the rest went smoothly - perhaps a single transaction with over 100K files was the issue?
I recently updated boost and so had to keep and promote 25K+ files. It took a minute or two but not unreasonable considering the number of files and the size of the binaries.
As for the number of streams, we have over 800 streams and workspaces. Performance here is not an issue. In general I find the large number of streams hard to navigate so I run a filtered view of just my workspaces and the just streams I'm interested in. However when I need to look at the unfiltered list to find something performance is fine.
As a final note, AccuRev support is terrific - we call them the voice in the sky. Every now and again we shoot ourselves in the foot using AccuRev and wind up clueless on how to fix things. Almost always we did something dumb and then tried something dumber to fix it. Eventually we place a support request and next thing we know they are walking us through the steps to righteousness either on the phone or a goto meeting. I've even contacted them for trivial things that I just don't have time to figure out as I'm having a hectic day and they kindly walk me through it rather than telling me to RTFM.
Edit 2014: We can now get acceptable X-Windows performance by using the commercial version of RealVNC.
Original comment:This answer applies to any version of Accurev, not just 4.7. Firstly, GUI performance might be OK if you can use the web client. If you can't use the web client and if you want GUI performance then you'd better be using Windows, or have all your developers in one place, i.e. where the Accurev server is located. Try to run the GUI on X-Windows over a WAN ? Forget it : our experience has been dozens of seconds or minutes for basic point and click operations. This is over a fairly good WAN about 800 miles distant, with an almost optimal ping time. This is not a failing of Accurev, but of X-Windows, and you'll likely have similar problems with other X applications over a WAN. So avoid basic X if you possibly can. Currently we cannot, and our WAN users are forcibly relegated to command-line only. The basic problem is that Accurev is is centralized and you can't increase the speed of light. I believe you can get around WAN latency by running Accurev Replication Servers, but that still does not properly address the problem if you have remote developers at single-person offices over VPN. It is ironic that the replication servers somewhat turn this centralized VCS into a form of DVCS. If you don't have replication servers then a horrible but somewhat workable work-around is to use a delta-synchronization tool such as rsync to sync your source tree between your local machine where you can run the GUI (i.e. GUI running directly on your Windows or Linux laptop), and the machine where you're actually working (e.g. UNIX machine 1,000 miles away). Another option is to use something like VNC which works better over a WAN than X, connecting to a virtual desktop at the Accurev server's location, and use X from there. At my workplace more than one team has resorted to using Mercurial on the side and promoting to Accurev only when it's strictly necessary. As Stephen Nutt points out above, other necessary work is to use time-stamp optimization and ignores. We also have our Accurev admins (yes, it requires you employ people to baby sit it) complain when we need to include large numbers of files, despite the fact they form a core part of our product and MUST be included and version controlled. Draw your own conclusions.

How to limit the effect of client modifications to production systems

Our shop has developed a few WEB/SMS/DB solution for a dozen client installations. The applications have some real-time performance requirements, and are just good enough to function properly. The problem is that the clients (owners of the production servers) are using the same server/database for customizations that are causing problems with the performance of the applications that we created and deployed.
A few examples of clients' customizations:
Adding large tables with many text datatypes for the columns that get cast to other data types in the queries
No primary keys, indexes, or FK constraints
Use of external scripts that use count(*) from table where id = x, in a loop from the script, to determine how to construct more queries later in the same script. (no bulk actions that the planner can optimize or just do everything in a single pass)
All new code files on the server are created/owned by root, with 0777 permissions
The clients don't take suggestions/criticism well. If we just go ahead and try to port/change the scripts ourselves, the old code can come back, clobbering any changes that we make! Or with out limited knowledge of their use cases, we break functionality while trying to optimize their changes.
My question is this: how can we limit the resources to queries/applications other that what we create and deploy? Are there any pragmatic options in scenarios like this? We prided ourselves in having an OSS solution, but it seems that it's become a liability.
We use PG 8.3 running on a range on Linux Distos. The clients prefer php, but shell scripts, perl, python, and plpgsql are all used on the system in one form or another.
This problem started about two minutes after the first client was given full access to the first computer, and it hasn't gone away since. Anytime someone whose priorities are getting business oriented work done quickly they will be sloppy about it and screw up things for everyone. That's just how things work, because proper design and implementation are harder than cheap hacks. You're not going to solve this problem, all you can do is figure out how to make it easier for the client to work with you than against you. If you do it right, it will look like excellent service rather than nagging.
First off, the database side. There's now way to control query resources in PostgreSQL. The main difficulty is that tools like "nice" control CPU usage, but if the database doesn't fit in RAM it may very well be I/O usage that is killing you. See this developer message summarizing the issues here.
Now, if in fact it's CPU the clients are burning through, you can use two techniques to improve that situation:
Install a C function that changes the process priority (example 1, example 2) and make sure whenever they run something it gets called first (maybe put it into their psql config file, there are other ways).
Write a script that looks for postmaster processes spawned by their userid and renice them, make it run often in cron or as a daemon.
It sounds like your problem isn't the particular query processes they're running, but rather other modifications they're making to the larger structure. There's only one way to cope with that: you have to treat the client like they're an intruder and use the approaches of that portion of the computer security field to detect when they screw things up. Seriously! Install an intrusion detection system like Tripwire on the server (there are better tools, that's just the classic example), and have it alert you when they touch anything. New file that's 0777? Should jump right out of a proper IDS report.
On the database side, you can't directly detect the database being modified usefully. You should do a pg_dump of the schema every day into a file (pg_dumpall -g and pg_dump -s, then diff that against the last one you delivered and again alert you when it's changed. If you manage that this well, the contact with the client turns into "we noticed you changed on the server...what is it you're trying to accomplish with that?" which makes you look like you're really paying attention to them. That can turn into a sales opportunity, and they may stop fiddling with things as much just knowing you're going to catch it immediately.
The other thing you should start doing immediately is install as much version control software as you can on each client box. You should be able to login to each system, run the appropriate status/diff tool for the install, and see what's changed. Get that mailed to you regularly too. Again, this works best if combined with something that dumps the schema as a component to what it manages. Not enough people use serious version control approaches on the code that lives in the database.
That's the main set of technical approaches useful here. The rest of what you've got is a classic consulting client management problem that's far more of a people problem than a computer one. Cheer up, it could be worse--FSM help you if you give them ODBC access and they discover they can write their own queries in Access or something simple like that.