EC2 Linux MarkLogic9 start service failed

EC2 Linux MarkLogic9 start service failed - service

I added an instance that is RedHat Linux64. Installed JDK successfully. Then used SSH to send MarkLogic9 installation package to Linux and install finished. When I start MarkLogic service the messages came as following. (P.S: this is my first time to install MarkLogic)
Instance is not managed
Waiting for device mounted to come online : /dev/xvdf
Volume /dev/sdf has failed to attach - aborting
Warning: ec2-startup did not complete successfully
Check the error logs for details
Starting MarkLogic: [FAILED]
And following is log info:
2017-11-27 11:16:39 ERROR [HandleAwsError # awserr.go.48] [instanceID=i-06sdwwa33d24d232df [HealthCheck] error when calling AWS APIs. error details - NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Using the Source of Infinate Wisdom, I googled for "Install MarkLogic ec2 aws"
Not far down I find [https://docs.marklogic.com/guide/ec2.pdf][1]
Good document to read.
If you choose to ignore the (literally "STOP" in all caps) "STOP: Before you do Anything!" suggestion on the first page, you can go further and find that ML needs a Data volume, and that using the root volume is A Bad Idea (its too small and crash your system when it fills up, let alone vanish if your instance terminates). So if you choose to not use the recommended CloudFormation script for your first experience, you will need to manually create and attach a data volume, among other things.
. [1]: https://docs.marklogic.com/guide/ec2.pdf

the size and compute power of the host systems runnin ML are othaomal o the deployment and orchestration methods.
entirely diverent issues. yes you should start wit the sample cloud formation scripts... but not due to size and performance, due to
the fact they were built to make a successful first time experience as painless as possible. you would have had your ML server up and running in less time them it took to post to stackoverflow a question asking why it wasn’t,
totally unrelated - except for the built in set of instance types for the amis (1)
what configurations are possible v recommended o supported,
all large;y dependent on workload and performance expectations.
marklogic can and run on resource constrained system — whether and how it works well requires the same mythodology to answer for micro and mega systems ..., workload. data size and format, query and data processing code used, performance requirements, working set, hw, sw, vm, networking, storage ... while designed to support large enterprise workloads well,
there are also very constrained platforms and workloads in use in production systems. a typical low end laptop can run ML fine ... for the some use cases, where others may need a cluster of a dozen or a hundred high end monsters.
(1). ‘supported instance types’ with marketplace amis ...
yes these do NOT include entry level ec2 instance types last i looked.
the rationale similar to why the st dre scripts make it hard to abuse the root volume for a data volume — not because it cannot be done,
rather an attempt to provide the best chance of a successful first time experience to the targeted market segment ... constrained by having only one chance to do it, knowing nothing at all about the intended use. ... a blind educated guess coupled with a lot of testing and support history about how people get things wrong no matter how much you lead them.
while ‘micro’ systems can be made to work successfully —in some specialized use cases, usually they don’t do as well as, as easily, reliably and handle as large a variety of whateveryouthrowatthem without careful workload specific tuning and sophisticated application code —
similarly ,,, there is a reason the docs make it as clear as humanly possible, even annoyingly so, that you should start with the cloud formation templates —
short of refusing to run without them.
can ML run on Platform X with Y-Memory, Z hypervisor, on docker or vmware or virtual box or brand acme raid controller ...
very likely —,with some definition of ‘run’ and configured for those exact constraints
very unlikely for arbitrary definitions of ‘run’ and no thought or effort to match the deployment with the environment
will it be easy to setup by someone who’s never done it before, run ‘my program’, at ‘my required speeds’ out of the box with no problems, no optimization’s, performance analysis, data refactoring, custom queries.
for a reasonably large set of initial use cases — for at least a reasonable and quick POC, very likely — if you follow the installation guide, with perhaps a few parameter adjustments
is that the best it can do ? absolutely not.
but it’s very close given absolutely no knowledge of the users actual application, technical,experience, workloads, budget, IT staff, dev and qa team, requirements, business policies, future needs, staff, phase of the moon.
recommend, read the ec2 docs.
do what they say
try it out with a realistic set of data and applications for your use,
test. measure, experiment , learn
THEN and ONLY THEN worry about if it will work on. t2.micro or m4.64xlarge9orbclusters thereof .. )
that is the beginning not the end
the end is never, you can and should consider continual analysis and improving IT configurations as part of ongoing operating procedures.
minimizing cost is a systemic problem with many dimensions —
and on aws it’s FREE to change. It’s EXPENSIVE to not plan forchange.
change is cheep
experimentation is cheep
choose instance types, storage, networking etc last not first.
consider TCOA . question requirements ... do you NEED that dev system running sunday at 3am? can QA tolerate occasional failures in exchange for 90% cost savings ? Can you avoid over commitment by auto scaling ?
Do you need 5 9’s or is 3 9’s enough ? can ingest be offloaded to non production systems with cheaper storage ? Can a middle tear be used ... or removed to novevwork to the most cost effectiv4 components ? is labor or it more costly
instant type is actually one of the least relevant components in TCOA

Related

Postgres auto tuning

As we all know postgres performance highly depends on config params. Eg if I have ssd drive or more RAM I need to tell that postgres by changing relevant cfg param
I wonder if there is any tool (for Linux) which can suggest best postgres configuration for current hardware?
Im aware Websites (eg pgtune) where I can enter server spec and those can suggest best config
However each hardware is different (eg I might have better raid / controller or some processes what might consume more ram etc). My wish would be postgres doing self tuning, analysing query execution time available resources etc
Understand there is no such mechanism, so maybe there is some tool / script I can run which can do this job for me (checking eg disk seq. / random disk read, memory available etc) and telling me what to change in config

There are parameters that you can tweak to get better performance from postgresql.
This article gives good read about that.
There are few scripts that can do that. One that is mentioned in postgres wiki is this one.
To get more idea about what more tuning your database needs, you need to log its request and performance, after analysing those logs you can tune more params. For this there is pgbadger log analyzer.
After using database in production, you get more idea regarding what requirements you have and how to approach them rather than making changes just based on os or hardware configuration.

How does on-demand pruning of kernel work in context of Unikernel?

In Unikernel computing environment, kernel sometimes can be pruned targeting the need of specific user application.
How does this process work? is it manual or can it be automated?

There is no 'pruning' of the kernel. To be clear all unikernel implementations don't use linux - they have written their own kernel (whether they want to admit to that or not).
There are well over 10 different unikernel implementations out there today and some are very focused on providing the absolute bare minimum of things to work. Others, such as nanos of which I work on are in the 'posix' category meaning if it runs on linux it'll probably run on here.
Whenever you see references to people talking about 'only including what you need' and picking certain libraries to use -- most of these systems are not much more than a hello world -- which is simple enough to do in ~20 lines of assembly. When you start trying to run real world applications, especially ones that you use everyday (did you write your own database? did you write your own webserver?) and compare things like performance you start running into actual operating system components that you can't just pick/choose.
Having said that, supporting the set of syscalls that linux has is only a portion of the battle. There is a lot of code that goes into a general purpose operating system like linux (north of 20M loc and that's a 'small' kernel) or windows or macos that is not readily recognizable by it's end users. The easy things to point at are picking which network driver to use depending on which hypervisor you want to run it on (xen/kvm/hyper-v).
The harder, less clear things are making choices between things like apic or x2apic? How many levels are your page tables? Do you support smp or not? If so how?
You can't just 'prune' this type of stuff away. It needs to be consciously added and even if you did have a linux that you were 'unikernelizing' you wouldn't be able to prune it cause it just touches too much code. As a thought exercise try removing support for multiple processes (no unikernel supports this). Now you are touching multiple schedulers, shared memory, message passing, user privileges, (are there users?), etc. (this list goes on forever).
Your question is actually a very common question and it highlights a misunderstanding of how these things work in real life.
So to answer your question - there is no work that I'm aware of where people are trying to automatically 'prune' the kernel and even the projects that exist where you can select what type of functionality via config file or something is not something that I would expect to see much progress in because of the aforementioned reasons.

Why does the Kubernetes revisionHistoryLimit default to 2?

Kubernetes' .spec.revisionHistoryLimit is used in keeping track of changes to the Deployment definition. Since these definitions are small yaml files, it doesn't seem like much (given the typical conditions of a modern cloud server) to keep 100's or more of these definitions around if necessary.
According to the documentation, it is going to be set to 2 (instead of unlimited). Why is this?

Keeping it unlimited would eventually clog up etcd and iiuc etcd isn't designed for big data usages. Also the Kubernetes control plane is syncing and downloading regulary, which means there would be much unnecessary data to process. Just image service with daily deploymets, which runs over a year.
And setting this to a high-ish number like 100 seems very random to me. Why not 1000 vor 9001? Besides that I cannot imaging anyone who might want to roll back something a hundred versions.
Anyway, we are only talking about a default setting, so you can set it to a very high number, if your usw case requires it.

Parallel processing input/output, queries, and indexes AS400

IBM V6.1
When using the I system navigator and when you click System values the following display.
By default the Do not allow parallel processing is selected.
What will the impact be on processing in programs when you choose multiple processes, we have allot of rpgiv programs and sql queries being executed and I think it will increase performance?
Basically I want to turn this on in production environment but not sure if I will break anything by doing this for example input or output of different programs running parallel or data getting out of sequence?
I did do some research :
https://publib.boulder.ibm.com/iseries/v5r2/ic2924/index.htm?info/rzakz/rzakzqqrydegree.htm
And understand each option but I do not know the risk of changing it from default to multiple.

First off, in order get the most out of *MAX and *OPTIMIZE, you'd need a system with more than one core (enabled for IBM i / DB2) along with the DB2 Symmetric Multiprocessing (SMP) (57xx-SS1 option 26) license program installed; thus allowing the system to use SMP for queries and index builds.
For *IO, the system can use multiple tasks via simultaneous multithreading (SMT) even on a single core POWER 5 or higher box. SMT is enabled via the Processor multi tasking (QPRCMLTTSK) system value
You're unlikely to "break" anything by changing the value. As long as your applications don't make bad assumptions about result set ordering. For example, CPYxxxIMPF makes use of SQL behind the scenes; with anything but *NONE you might end up with the rows in your DB2 table in different order from the rows in the import file.
You will most certainly increase the CPU usage. This is not a bad thing; unless you're currently pushing 90% + CPU usage regularly. If you're only using 50% of your CPU, it's probably a good thing to make use of SMT/SMP to provide better response time even if it increases the CPU utilization to 60%.
Having said that, here's a story of it being a problem... http://archive.midrange.com/midrange-l/200304/msg01338.html
Note that in the above case, the OP was pre-building work tables at sign on in order to minimize the wait when it was time to use them. Great idea 20 years ago with single threaded systems. Today, the alternative would be to take advantage of SMP/SMT and build only what's needed when needed.
As you note in a comment, this kind of change is difficult to test in non-production environments since workloads in DEV & TEST are different. So it's important to collect good performance data before & after the change. You might also consider moving it stages *NONE --> *IO --> *OPTIMIZE and then *MAX if you wish. I'd spend at least a month at each level, if you have periodic month end jobs.

Clarifications in Electric commander and tutorial

I was searching for tutorials on Electric cloud over the net but found nothing. Also could not find good blogs dealing with it. Can somebody point me in right directions for this?
Also we are planning on using Electric cloud for executing perl scripts in parallel. We are not going to build software. We are trying to test our hardware in parallel by executing the same perl script in parallel using electric commander. But I think Electric commander might not be the right tool given its cost. Can you suggest some of the pros and cons of using electric commander for this and any other feature which might be useful for our testing.
Thanks...

RE #1: All of the ElectricCommander documentation is located in the Electric Cloud Knowledge Base located at https://electriccloud.zendesk.com/entries/229369-documentation.
ElectricCommander can also be a valuable application to drive your tests in parallel. Here are just a few aspects for consideration:
Subprocedures: With EC, you can just take your existing scripts, drop them into a procedure definition and call that procedure multiple times (concurrently) in a single procedure invocation. If you want, you can further decompose your scripts into more granular subprocedures. This will drive reuse, lower cost of administration, and it will enable your procedures to run as fast as possible (see parallelism below).
Parallelism: Enabling a script to run in parallel is literally as simple as checking a box within EC. I'm not just referring to running 2 procedures at the same time without risk of data collision. I'm referring to the ability to run multiple steps within a procedure concurrently. Coupled with the subprocedure capability mentioned above, this enables your procedures to run as fast as possible as you can nest suprocedures within other subprocedures and enable everything to run in parallel where the tests will allow it.
Root-cause Analysis: Tests can generate an immense amount of data, but often only the failures, warnings, etc. are relevant (tell me what's broken). EC can be configured to look for very specific strings in your test output and will produce diagnostic based on that configuration. So if your test produces a thousand lines of output, but only 5 lines reference errors, EC will automatically highlight those 5 lines for you. This makes it much easier for developers to quickly identify root-cause analysis.
Results Tracking: ElectricCommander's properties mechanism allows you to store any piece of information that you determine to be relevant. These properties can be associated with any object in the system whether it be the procedure itself or the job that resulted from the invocation of a procedure. Coupled with EC's reporting capabilities, this means that you can produce valuable metrics indicating your overall project health or throughput without any constraint.
Defect Tracking Integration: With EC, you can automatically file bugs in your defect tracking system when tests fail or you can have EC create a "defect triage report" where developers/QA review the failures and denote which ones should be auto-filed by EC. This eliminates redundant data entry and streamlines overall software development.
In short, EC will behave exactly they way you want it to. It will not force you to change your process to fit the tool. As far as cost goes, Electric Cloud provides a version known as ElectricCommander Workgroup Edition for cost-sensitive customers. It is available for a small annual subscription fee and something that you may want to follow up on.
I hope this helps. Feel free to contact your account manager or myself directly if you have additional questions (dfarhang#electric-cloud.com).

Maybe you could execute the same perl script on several machines by using r-commands, or cron, or something similar.

To further address the parallel aspect of your question:
The command-line interface lets you write scripts to construct
procedures, including this kind of subprocedure with parallel steps.
So you are not limited, in the number of parallel steps, to what you
wrote previously: you can write a procedure which dynamically sizes
itself to (for example) the number of steps you would like to run in
parallel, or the number of resources you have to run steps in
parallel.