Two master instances on same database - postgresql

I want to use Postgresql in Windows Server 2012 R2 for one our project where it can be 24/7 uptime.
I would like to ask the community if I can have 2 master instances in 2 different servers A&B and they will 'work' on the same DB located in a shared file storage in lan. Always one master instance on server A will be online and when it goes offline for some reason (I suppose) a powershell script will recognize that the postgresql service stopped and will start the service in server B. The same script will continuous check that only one service in servers A & B is working to avoid conflicts.
I'd like to ask if this is possible or a better approach for my configuration.
(I can't use replication because when server A shuts down the server B is in read-only mode thing that I don't want)

If you manage to start two instances of PostgreSQL on the same data directory, serious data corruption will happen.
Normally there is a postmaster.pid file that prevents that, but a PostgreSQL server process on a different machine that accesses the same file system will happily unlink that after spewing some log messages, thinking it was left behind from a crash.
So you are really walking on thin ice with a solution like that.
One other issue that you didn't think of is that script that is supposed to check if the server is still running. What if that script fails, because for example the network connection between the two servers is down, but the server is still up an running happily? Such a “split brain” scenario will cause data corruption with your setup.
Another word of caution: since you seem to be using Windows (Powershell?), you probably envision a CIFS file system when you are talking of shared storage. A Windows “network share” is not a reliable file system — last time I checked, it did not honor _commit.
Creating a reliable failover cluster is harder than you think, and I'd recommend that you check existing solutions before you try to roll your own.

Related

How does NFS process requests for data?

When I used someone else's framework, I found that it would use NFS technology to share a specified folder before performing distributed computing.
For example, there are two parts 'part1' and 'part2' in this folder. Then if my machine 1 reads 'part1' and machine 2 reads 'part2', if machine 1 wants to get the content of 'part2', then it should make a request directly to machine 2, or directly read the local 'part2' file?
My understanding is that NFS can synchronize each machine under the corresponding folder, and the file will be stored in each machine, rather than a link to the corresponding location of a certain machine. I'm not sure if this understanding is correct.
NFS makes files available over a network. Using your example, if machine 1 and machine 2 are clients of the NFS server, they won't refer to each other when attempting to retrieve data. As such, when machine 1 wants 'part2', it will make the request to the NFS server rather than to machine 2 (despite the fact machine 2 has read 'part2').
The reasoning for this is that the version of 'part2' that exists on the NFS server may have changed in the time between machine 2 reading 'part2', making machine 2's copy of 'part2' out of date. By making all requests to the NFS server, clients can ensure that they are getting the most recent version of a file at any given time.
The behaviour you're describing is more akin to the behaviour of BitTorrent (https://en.wikipedia.org/wiki/BitTorrent). BitTorrent solves the out-of-date file problem by not allowing files to ever change and distributing hashes of the files. Knowing this, your torrent client can request parts of a folder or file from anyone in a 'swarm' and independently verify that the parts you received are correct.

Implementing a distributed grep

I'm trying to implement a distributed grep. How can I access the log files from different systems? I know I need to use the network but I don't know whether you use ssh, telnet, or anything else? What information do I need to know about the machines I am going to connect to from my machine? I want to be able to connect to different Linux machines and read their log files and pipe it back to my machine.
Your system contains a number of Linux machine which produce log data(SERVERs), and one machine which you operate(CLIENT). Right?
Issue 1) file to be accessed.
In general, log file is locked by a software which produce log data, because the software has to be able to write data into log file at any time.
To access the log file from other software, you need to prepare unlocked log data file.
Some modification of the software's setup ane/or the software(program) itself.
Issue 2) program to serve log files.
To get log data from SERVER, each SERVERs have to run some server program.
For remote shell access, rshd (remote shell deamon) is needed. (ssh is combination of rsh and secure communication).
For FTP access, ftpd (file transfer protocol deamon) is needed.
The software to be needed is depend how CLIENT accesses SERVERs.
Issue 3) distribued grep.
You use words 'distribued grep'. What do you mean by the words?
What are distribued in your 'distributed grep'?
Many senarios came in my mind.
a) Log files are distribued in SERVERs. All log data are collected to CLIENT, and grep program works for collected log data at CLIENT.
b) Log files are distribued in SERVERs. Grep function are implemented on each SERVERs also. CLIENT request to each SERVERs for getting the resule of grep applied to log data, and results are collected to CLIENT.
etc.
What is your plan?
Issue 4) access to SERVERs.
Necessity of secure communication is depend on locations of machines and networks among them.
If all machines are in a room/house, and networks among machines are not connected the Internet, secure communication is not necessary.
If the data of log is top secret, you may need encript the data before send the data on the network.
How is your log data important?
At very early stage of development, you should determing things described above.
This is my advice.

AWS deployment without using SSH

I've read some articles recently on setting up AWS infrastructure w/o enabling SSH on Ec2 instances. My web app requires a binary to run. So how can I deploy my application to an ec2 instance w/o using ssh?
This was the article in question.
http://wblinks.com/notes/aws-tips-i-wish-id-known-before-i-started/
Although doable, like the article says, it requires to think about servers as ephemeral servers. A good example of this is web services that scale up and down depending on demand. If something goes wrong with one of the servers you can just terminate your server and spin up another one.
Generally, you can accomplish this using a pull model. For example at bootup pull your code from a git/mecurial repository and then execute scripts to setup your instance. The script will setup all the monitoring required to determine whether your server and application are up and running appropriately. You would still need an SSH client for this if you want to pull your code using ssh. (Although you could also do it through HTTPS)
You can also use configuration management tools that don't use ssh at all like Puppet or Chef. Essentially your node/server will pull all your application and server configuration from the Puppet master or the Chef server. The Puppet agent or Chef client would then perform all the configuration/deployment/monitoring changes for your application to run.
If you with this model I think one of the most critical components is monitoring. You need to know at all times if there's something wrong with one of your server and in the event something goes wrong discard the server and spin up a new one. (Even better if this whole process is automated)
Hope this helps.

Need an opinion on a method for pull data from a file with Perl

I am having a conflict of ideas with a script I am working on. The conflict is I have to read a bunch of lines of code from a VMware file. As of now I just use SSH to probe every file for each virtual machine while the file stays on the server. The reason I am now thinking this is a problem is because I have 10 virtual machines and about 4 files that I probe for filepaths and such. This opens a new SSH channel every time I refer to the ssh object I have created using Net::OpenSSH. When all is said and done I have probably opened about 16-20 ssh objects. Would it just be easier in a lot of ways if I SCP'd the files over to the machine that needs to process them and then have most of the work done on the local side. The script I am making is a backup script for ESXi and it will end up storing the files anyway, the ones that I need to read from.
Any opinion would be most helpful.
If the VM's do the work locally, it's probably better in the long run.
In the short term, the ~equal amount of resources will be used, but if you were to migrate these instances to other hardware, then of course you'd see gains from the processing distribution.
Also from a maintenance perspective, it's probably more convenient for each VM to host the local process, since I'd imagine that if you need to tweak it for a specific box, it would make more sense to keep it there.
Aside from the scalability benefits, there isn't really any other pros/cons.

Deployment process

We are having a massive system having around 15 servers hosting .Net WCF services, mvc application etc.
When we do a deployment (out of office hours) we have to uninstall and install everything on the live servers.
This takes lot of time and if something goes wrong we have to rollback everything.
can you please suggest something different to this?
like
Deply into a other environment (whenever you like) and switch the URL to point to new servers
[This comes with the overhead of cost of maintaining 2 copies of production (active and passive)]
any other ideas please.
Does services need to be uninstalled for all deployments ?
You can have a script that does this against all the servers in parallel:
Stop any windows services
Stop IIS
Make backup of replaced files
XCopy assemblies, resources, website files.
Perhaps run InstallUtil if deploying a service (as needed).
Start IIS and services.
Such a script will not take too long to execute. With 15 servers it will be well worth the effort writing it and make the deployment and rollback process completely automated.
It sounds like you need a load balancer to handle the trafic to your production servers. You would deploy all your new code to Server Farm B and test it using a test DNS entry. Once you are satisfied with the changes you would repoint your load balancer addresses from Server Farm A to Server Farm B it will then become live. The only down side to this is with database changes.