Create failover postgresql cluster if you have 2 virtmachine without load balancing server - centos

Whats recommendation can you give me on setting up a database failover postgresql cluster? I have only 2 virtual machine.
Right now i read this https://wiki.clusterlabs.org/wiki/PgSQL_Replicated_Cluster
I have some questions about it:
Where is it written in the configuration files when the second machine should turn on as an active one?
How does the first car understand that the second car is active?
Why does not the virtual IP address conflict?
When the main machine turns on, how will the system understand what needs to be done replication from the second server?
Sorry for my bad English

Its almost 2 months you asked it but it seems you are in same boat as I was in few weeks back. I have gone through your link and it explains that you need to use corosync + pacemaker + pcs. Frankly, I have no experience on any of them but I used pgpool2 4.0.4 (latest at the time of writing) with PostgreSQL 9.5.14 and 10.7, successfully able to brought up 2 clusters in last 2 months.
With pgpool you do not need to use any other tool/library and all configuration goes to one file pgpool.conf along with few password (1 liners) in pool_password and pcp.conf.
All the needed configuration of watchdog(component of pgpool cluster) to find out the live/dead status of cluster comes with pgpool and merely need configuration to handle it.
You may find more information on pgpool2 at here and about latest version at here.
Also you may refer (just read first to get a gist of whole process) at link which is super useful and quite detailed on how the whole process goes.
Also let us know if you were able to setup cluster with mentioned technologies at your link.
Edit: you may find extracted configurations of pgpool.conf at my gist page
I have kept only the settings which I changed. Rest have been left as default , or may be i forgot to add 1-2 to this.
Most of the comment on the file come right from standard documentation and self-explanatory but few places I have added my own comment and they are
vip configuration.
At one place I am using a different postgres password.
note about recovery_1st_stage
note about key file referred by logdir
Also most important things is , sit back and read through original links referring to std. documentation to just a gist of what the whole thing/process is. It will be easier for you to modify it as per your needs later.
I read it , 3-4 times ( slow learner ) both the documentation and then used a mix of both approaches.
Also there are 4 files, i created
recovery_1st_stage
pgpool_remote_start.sh
failover.sh
promote_standby.sh
You will find guidance on these at both the places : std. documentation and other tutorial. they are plain sh file with a bunch of ssh and psql commands.

Related

EC2 Linux MarkLogic9 start service failed

I added an instance that is RedHat Linux64. Installed JDK successfully. Then used SSH to send MarkLogic9 installation package to Linux and install finished. When I start MarkLogic service the messages came as following. (P.S: this is my first time to install MarkLogic)
Instance is not managed
Waiting for device mounted to come online : /dev/xvdf
Volume /dev/sdf has failed to attach - aborting
Warning: ec2-startup did not complete successfully
Check the error logs for details
Starting MarkLogic: [FAILED]
And following is log info:
2017-11-27 11:16:39 ERROR [HandleAwsError # awserr.go.48] [instanceID=i-06sdwwa33d24d232df [HealthCheck] error when calling AWS APIs. error details - NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Using the Source of Infinate Wisdom, I googled for "Install MarkLogic ec2 aws"
Not far down I find [https://docs.marklogic.com/guide/ec2.pdf][1]
Good document to read.
If you choose to ignore the (literally "STOP" in all caps) "STOP: Before you do Anything!" suggestion on the first page, you can go further and find that ML needs a Data volume, and that using the root volume is A Bad Idea (its too small and crash your system when it fills up, let alone vanish if your instance terminates). So if you choose to not use the recommended CloudFormation script for your first experience, you will need to manually create and attach a data volume, among other things.
. [1]: https://docs.marklogic.com/guide/ec2.pdf
the size and compute power of the host systems runnin ML are othaomal o the deployment and orchestration methods.
entirely diverent issues. yes you should start wit the sample cloud formation scripts... but not due to size and performance, due to
the fact they were built to make a successful first time experience as painless as possible. you would have had your ML server up and running in less time them it took to post to stackoverflow a question asking why it wasn’t,
totally unrelated - except for the built in set of instance types for the amis (1)
what configurations are possible v recommended o supported,
all large;y dependent on workload and performance expectations.
marklogic can and run on resource constrained system — whether and how it works well requires the same mythodology to answer for micro and mega systems ..., workload. data size and format, query and data processing code used, performance requirements, working set, hw, sw, vm, networking, storage ... while designed to support large enterprise workloads well,
there are also very constrained platforms and workloads in use in production systems. a typical low end laptop can run ML fine ... for the some use cases, where others may need a cluster of a dozen or a hundred high end monsters.
(1). ‘supported instance types’ with marketplace amis ...
yes these do NOT include entry level ec2 instance types last i looked.
the rationale similar to why the st dre scripts make it hard to abuse the root volume for a data volume — not because it cannot be done,
rather an attempt to provide the best chance of a successful first time experience to the targeted market segment ... constrained by having only one chance to do it, knowing nothing at all about the intended use. ... a blind educated guess coupled with a lot of testing and support history about how people get things wrong no matter how much you lead them.
while ‘micro’ systems can be made to work successfully —in some specialized use cases, usually they don’t do as well as, as easily, reliably and handle as large a variety of whateveryouthrowatthem without careful workload specific tuning and sophisticated application code —
similarly ,,, there is a reason the docs make it as clear as humanly possible, even annoyingly so, that you should start with the cloud formation templates —
short of refusing to run without them.
can ML run on Platform X with Y-Memory, Z hypervisor, on docker or vmware or virtual box or brand acme raid controller ...
very likely —,with some definition of ‘run’ and configured for those exact constraints
very unlikely for arbitrary definitions of ‘run’ and no thought or effort to match the deployment with the environment
will it be easy to setup by someone who’s never done it before, run ‘my program’, at ‘my required speeds’ out of the box with no problems, no optimization’s, performance analysis, data refactoring, custom queries.
for a reasonably large set of initial use cases — for at least a reasonable and quick POC, very likely — if you follow the installation guide, with perhaps a few parameter adjustments
is that the best it can do ? absolutely not.
but it’s very close given absolutely no knowledge of the users actual application, technical,experience, workloads, budget, IT staff, dev and qa team, requirements, business policies, future needs, staff, phase of the moon.
recommend, read the ec2 docs.
do what they say
try it out with a realistic set of data and applications for your use,
test. measure, experiment , learn
THEN and ONLY THEN worry about if it will work on. t2.micro or m4.64xlarge9orbclusters thereof .. )
that is the beginning not the end
the end is never, you can and should consider continual analysis and improving IT configurations as part of ongoing operating procedures.
minimizing cost is a systemic problem with many dimensions —
and on aws it’s FREE to change. It’s EXPENSIVE to not plan forchange.
change is cheep
experimentation is cheep
choose instance types, storage, networking etc last not first.
consider TCOA . question requirements ... do you NEED that dev system running sunday at 3am? can QA tolerate occasional failures in exchange for 90% cost savings ? Can you avoid over commitment by auto scaling ?
Do you need 5 9’s or is 3 9’s enough ? can ingest be offloaded to non production systems with cheaper storage ? Can a middle tear be used ... or removed to novevwork to the most cost effectiv4 components ? is labor or it more costly
instant type is actually one of the least relevant components in TCOA

Mongoose / MongoDB replica set using secondary for reads

I recently changed my server setup to include a replica set. The secondary DBs are located in multiple regions around the world to decrease latency. The problem is that I think all of the reads are being done from the master and not from the secondary servers. I'm seeing 500ms+ latency in newrelic on servers far away from the master DB, but the staging server, which is in the same region as the master is ~20ms. How can I check if the secondary read or nearest is working, or do I have a setting missing / wrong? (I have tried both SECONDARY_PREFERRED, and NEAREST)
Url:
mongodb://1.1.1.1:27017,1.1.1.2:27017,1.1.1.3:27017,1.1.1.4:27017,1.1.1.5:27017/mydatabase
My options look like this:
"replSet": {
"rs_name": "myRepSet"
"readPreference": "ReadPreference.SECONDARY_PREFERRED",
"read_preference": "ReadPreference.SECONDARY_PREFERRED",
"slaveOk": true
}
Mongoose version: 3.8.x
As per the project issues on gitHub there was a issue where the Read preference does not seem to be working when upgrading to the newest version (mongoose#3.8.1 & mongodb#1.3.23) as all the reads are being done from the master and not from the secondary servers.
As per the comments this problem doesnot come when you roll back to older version(mongoose#3.6.4 & mongodb#1.2.14), reads will start going to the secondaries(collection level). This issue is meant to be fixed in version 3.8.7.
Please reference the following issues for the same:
https://github.com/Automattic/mongoose/issues/1833
https://github.com/Automattic/mongoose/issues/1895
How can I check if the secondary read or nearest is working,
If you have access to your machines, a simple way to check which ones are being queried is with mongostat. Just log into one of your servers and run
mongostat --discover
This will give you basic output on inserts/queries/updates/delete that are being run on each machine in your replica set. If you have a quiet system it will be easy to see where the queries are being redirected to so you can at least know whether your secondaries are being hit.
If they aren't, you will need to investigate your driver settings.
Instead of secondary can you check with nearest option. I guess that should work.
Check this link.
http://docs.mongodb.org/manual/reference/read-preference/#nearest

MongoDB one server problems/pittfalls

I'm working on a project using MongoDB and therefor asked my server manager to install MongoDB.
I recently read on an old stackoverflow thread that it is not really recommended to run MongoDB on a single server, because of a possibility of data loss.
I'm not really an expert and want to avoid such cases.
Do these problems still exist and should I look for another solution like remote databases or is it safe to install it?
What are pitfalls that I should make sure my managed server provider takes care of?
Warning - the article linked to by chridam is dangerously out-of-date.
Simply put, no, there's not much to worry about with single server deployments of MongoDB anymore. By default, MongoDB will write everything to the journal every 100ms. If there are writes with the j (journal) option, that interval is shortened to a third. I have posted a longer answer with the gritty details (two, actually) to a similar question some time ago.
The point is that a write operation with j : true won't return until the write made it to the journal (i.e., expect these calls to take 16ms+, on average), and that's exactly the behavior one would expect and that's also how most other dbs behave.
You should ensure you're using the journalling write concern (j : true) and journalling isn't disabled. Also, since the defaults depend on the server version and there's lots of new features, bugfixes and performance improvements, make sure you're getting a somewhat recent version of MongoDB (might not be the case if the server runs something like debian stable).

WTMP (RHEL 5/6) log maintenance - need to keep a rolling log rather than rotate

We have a policy requirement to use items using wtmp, such as the 'last' command or GDM-Last-Login details. We've discovered that these items will have gaps depending on when wtmp was last rotated, and need to try to work around this.
Because these gaps have been determined to be unacceptable, and keeping wtmp data in a single active logfile forever without splitting off the old data into archives is not really viable, I'm looking for a way to rollover / age-out old wtmp entries while still keeping more recent ones.
From some initial research I've seen this problem addressed in the Unix (AIX, SunOS) world with the use of 'fwtmp' and some pre/post logrotate scripts. Has this been addressed in the Linux world and I've just missed it?
So far as I can tell 'fwtmp' is a Unix built-in that's not made it into RHEL 5 & 6, per searching the RHEL customer portal and some 'yum whatprovides' searches on my test boxes.
Many thanks in advance!

Cassandra - Impact of configuring ColumnFamilies

I'm in the process of researching into various NoSQL technologies and currently looking into Cassandra (so I'm at a beginner level with regard to this!).
My understanding is you have to define ColumnFamilies in a config file - if you want to change a column family or add a new one, you have to restart Cassandra. What I'd like to know is what is the overall impact of this, in particular with regard to "downtime"?
e.g.
- presumably every node you have Cassandra running on, needs to be reconfigured and restarted
- suppose you have 10 nodes - how does it work if 5 nodes have been updated/restarted but the other 5 haven't? Is this a problem?
This is one (of many) things that will be addressed in the upcoming release; 0.7.
Please read the wiki page about live schema updates