Database Mirroring - App Can't Connect to Mirror - Named Pipes Provider: Could not open a connection to SQL Server [53] - sql-server-2008-r2

I have an application that can connect to the Principal, but can't connect to the Mirror during a failover.
(Note to moderator: please let me know if this question is more appropriate for serverfault. I posted it here because I found more questions similar to this issue than on serverfault.)
This is the error I receive when my application attempts to connect to the Mirror after a failover:
Named Pipes Provider: Could not open a connection to SQL Server [53].
Cannot open database "MY_DB_NAME" requested by the login. The login failed.
I am familiar with the fact that when initially connected to the Principal, the name of the Mirror server is cached to be used during the failover and that the failover partner I specify in my connection string is only used if the initial connection to the Principal fails.
This clearly describes the problem I'm having:
http://blogs.msdn.com/b/spike/archive/2010/12/15/running-a-database-mirror-setup-with-the-sqlbrowser-service-off-may-produce-unexpected-results.aspx
...but the SQL Browser Service is running and I can't figure out why the name won't resolve when connecting to the mirror.
I'm assuming there is a service that must be running to enable NetBIOS name resolution that is not running, because this is what I see in WireShark consistently without a response from the Mirror:
Source Destination Protocol Length Info
10.200.3.111 10.200.5.255 NBNS 92 Name query NB SQL-02-SVR-<00>
Question 1: What could be causing the problem? ;-)
Question 2: I really don't want to enable NetBIOS (for security reasons) and I'm using IP addresses (no FQDNs) in the mirror configuration and in the connection string. Given the caching behavior of the mirror partner when connecting to the Principal, is there a way to force TCP/IP to be used so the value that is cached is the IP address and not the name? Do I need to run the SQL Server Browser/Computer Browser services?
The configuration:
App Is Delphi XE2 using SDAC 6.5.9 (I don't think this is relevant to the component I'm using because it works in other installations with mirroring and has no issues)
SQL Server 2012 Enterprise installed as a default instance on Principal, Mirror and Witness in a non-domain configuration using certificate authentication.
Windows Server 2008 R2 SP1 64-bit on all machines
Firewalls disabled on Principal, Mirror and Client (where app is running)
TCP/IP and Named Pipes enabled on Principal and Mirror
SQL Server Browser service running on Mirror
Computer Browser service running on Mirror
Mirroring is configured for automatic failover with a witness and works properly (I can fail back and forth between mirror and principal without issue)
SQL Native Client 2012 installed on Client machine
Same app login (with same SID and user rights) exists on both Principal and Mirror
Correct server, failover partner, database name, user name and password verified in my app log
In connection string, principal server is 'tcp:10.200.3.15,1433' and failover partner is 'tcp:10.200.3.16,1433' using the SQL Native client
I can ping both servers from the Client machine
NetBIOS over TCP/IP has been enabled in the adapter under the WINS tab (on the Mirror and Client machines)
I've been able to get the application working with mirroring on several other installations, but this one is baffling me.

I found the problem, which was that the customer had the Principal and Mirror in one VLAN and the Client(s) in another. Although the IP addressing scheme was the same, the policy for communication between the VLANs prevented broadcast messages, which is why the NetBIOS query was failing on the client. A WINS or DNS server will be implemented to resolve this issue.
However, I am still interested in an answer to my Question #2, above.

Related

Using Kerberos for RDP

We are in the process of turning off NTLM in our environment for both inbound and outbound traffic via GPO. In our lab testing we have encountered the following when blocking inbound NTLM on a remote host:
RDP'ing to the remote host with inbound NTLM blocked via cross-forest generated a CredSSP error message.
Setting Encryption Oracle Remediation to either Mitigated or Vulnerable as a workaround did not work.
Turning off NLA on the remote host as a workaround will allow cross-forest RDP
I have tried applying "Allow delegating fresh credentials" via policy on the remote host but it is still getting the CredSSP error
I have also tried setting the policy on the remote host to use SSL for "Require use of specific security layer for remote (RDP) connections", and I still got the same CredSSP error.
What did work is if I try to RDP from the same forest to the remote host, it will allow the connection and I can confirm it is using Kerberos for RDP instead of NTLM.
Another observation is once the same forest RDP worked on the remote host, cross-forest RDP connection on the remote host with the blocked inbound NTLM will now work.
Has anyone encountered something similar like this before?
If so, has anyone found a solution for cross-forest RDP to work on a remote host with blocked inbound NTLM without the need to pre-auth on the remote host in the same forest?
The Encryption Oracle Remediation error is a red herring because it uses the same error code as the NTLM is not available error. Unless you haven't patched in 3 years it'll likely never be the Encryption Oracle Remediation issue. It's really just that it tried to fallback to NTLM and policy said no.
In all likelihood the issue is that the client can't find or communicate with a domain controller to do NLA.
The client must find the user's domain first (domain A). From there it authenticates their password. It then asks to get a ticket to the machine. The machine isn't in the user's domain so it creates a referral ticket to where it thinks the machine is (domain B).
The referral is handed back to the client and the client tries to find a DC to where the referral is supposed to go (domain B). The client sends the referral to domain B and asks for a ticket to the machine. The domain controller either finds the machine and issues a ticket for it, or says it doesn't know and offers a referral to another domain (domain C) and you try again, or it just fails saying no machine can be found.
All of this occurs from the client's perspective, not the target machine's perspective. This happens before the client even pings the target machine (ish). This is why disabling NLA appears to resolve the issue.
So there are a handful of reasons why this happens:
You used an IP address -- this is a straight-to-NTLM scenario. Kerberos doens't do IP addresses by default. You can turn it on, but it won't scale.
Client can't communicate with a DC in user's domain (domain A). Networking issue, client needs line of sight to domain controller, plus DNS.
Client can't communicate a with DC in the target machine's domain (domain B). Still a networking issue, client needs line of sight to domain controller, plus DNS.
You're not providing a proper fully qualified name and the user's DC can't figure out what forest it should refer to. You can enable Forest Search Order and it'll maybe help, or you can type in the fully qualified machine name.
This isn't an exhaustive list but these are the most common causes.
References:
https://syfuhs.net/windows-and-domain-trusts
https://syfuhs.net/how-authentication-works-when-you-use-remote-desktop
I also had a similar issue when using the DOMAIN\username login ; using the UPN (username#domaine.com) worked for me.
My understanding is using the UPN allows the client to know the DNS domain name, which then allows it to discover the DC of the remote domain through DNS resolution.
NB : my setup was from a workgroup server so not exactly the same as yours; YMMV.

How to Confirm PostgreSQL on Ubuntu VM is communicating with External Server for Updates

I have an Ubuntu VM installed on a client's VMware system. Recently, the client's IT informed us that his firewall has been detecting consistent potential port scans to our VM's internal IP address (coming from 87.238.57.227). He asked if this was part of a known package update process on our VM.
He sent us a firewall output where we can see several instances of the port scan, but there are also instances of our Ubuntu VM trying to communicate back to the external server on port 37258 (this is dropped by the firewall).
Based on a google lookup, the hostname of the external IP address is "feris.postgresql.org", with the ASN pointing to a European company called Redpill-Linpro. As far as I can tell, they offer IT consulting services, specializing in open source software (like PostgreSQL, which is installed on our VM). I have never heard of them before though and have no idea why our VM would be communicating with them or vice-versa. I'm also not sure if I'm interpreting the IP lookup information correctly: https://ipinfo.io/87.238.57.227
I'm looking for a way to confirm or disprove that this is just our VM pinging for a standard postgres update. If that's the case I'd like to restrict this behaviour. We would prefer to do these types of updates manually and limit the communication outside of the VM to what is strictly necessary for the functionality of our application.
Update
I sent an email to Redpill's abuse account. They responded quickly saying that the server should not be port scanning anyone and if it appears that way, something is wrong.
The server is part of a cluster of machines that serves apt.postgresql.org among other postgres download sites. I don't think we have anything like ansible or puppet installed that would automatically check for updates but I will look into that to make sure. I'm wondering if Ubuntu reaching out to update the MOTD with the number of available packages would explain why our VM is trying to reach out to the external postgres server?
The abuse rep said in any case there should only be outgoing connections from the VM, not incoming. He asked for some additional info so I will keep communicating with him and try to update this post accordingly
My communication with the client's IT dropped off so I did not get a definitive answer on this, but I'll provide some new details:
I reached out to the abuse email for Redpill-Linpro. He got back to me and confirmed the server corresponding to the detected IP address is part of a cluster that hosts postgres download sites, including apt.postgresql.org. He was surprised to learn we had detected a port scan from their server and seems eager to figure out why that is happening.
He asked if the client IT could pass along some necessary info for them to set up tracking on that server. But the client IT never got back to me. I think he was satisfied that it wasn't malicious and stopped pursuing it.
Here's one of the messages the abuse rep sent me that may be relevant:
That does look a lot like the tcp to the apt download server yes. It's
strange that your firewall reports that many incoming connections, but
they could be fallout from some connection tracking that's not
operating as intended. The timing appears to be matching up more or
less perfectly. And there should definitely not be any ping-back
connections from it.
Since you appear to be using the http version of the server (and not https) bringing the data in cleartext, they should be able to just
dump the TCP connection contents and verify exactly what it does. But
I bet they are going to see a number of http requests initiated by the
apt client that is checking for updates.

Does the fact I'm running a VM alter the whitelisting status of my regular ip address?

Our dev ops team have whitelisted my home ip address so that I can connect to our Postgres database on Azure. I am able to connect to our Azure database due to this.
Today I set up a VM in order to run Docker. I am running a container for RStudio which is an app that, among many other things, allows me to connect to our database using ODBC.
After configuring the odbcinst and odbc.ini files I believe that those are configured correctly because when I try to connect I get the following error:
Error: nanodbc/nanodbc.cpp:983: 00000: FATAL: SSL connection is required. Please specify SSL options and retry.
Thus I think that my odbc set up is correct because this error suggests my connection setting are fine, it's just that Azure will not allow it without SSL.
Searching that error message took me to this SO post with the following accepted answer:
By default, Azure Database for PostgreSQL enforces SSL connections between your server and your client applications to protect against MITM (man in the middle) attacks. This is done to make the connection to your server as secure as possible.
Although not recommended, you have the option to disable requiring SSL for connecting to your server if your client application does not support SSL connectivity. Please check How to Configure SSL Connectivity for your Postgres server in Azure for more details. You can disable requiring SSL connections from either the portal or using CLI. Note that Azure does not recommend disabling requiring SSL connections when connecting to your server.
My question is, if I am already able to connect to our database outside of my VM due to my home IP being whitelisted and just using a Postgres Driver with Dbeaver SQL client, is there anything I can do to connect from within my VM?
I can get my VMs ip address but I suspect (am not sure) if sending hat to our developers to whitelist would work?
Is there a prescribed course of action here?
I added this parameter to my .odbc.ini file and was able to connect:
sslmode=require
From Azure Postgres documentation, this parameter may take on different permutations depending on the context
"for example "ssl=true" or "sslmode=require" or "sslmode=required" and other variations"

What's the correct MSDTC configuation for a clustered SQL server for BizTalk WCF SQL adapter

I have a issue on connecting to a clustered sql server instance using wcf-sql adapter.
The sql cluster infrastructure is :
We have 2 servers, SVR1 and SVR2, each have a named SQL instance INST1 installed and these 2 servers are clustered. In SRV1, a clustered MSDTC installed and assigned a NETBIOS name as DTCCLUSTER1. SRV1/SRV2 and DTCCLUSTER1 have its own IP address.
When I try to connect to this SQL Server using WCF-SQL Adapter, I got a timeout error and finally find out this is caused by a MSDTC connection issue. DTCPing test failed in both SRV1 to BizTalk server and BizTalk to SRV1.
The SRV1 hosting DTCCLUSTER1 have been configured to allow both inbound and outbound connections. For security reason, we cannot allow "No Auth" in MSDTC but choosed "Mutual Auth required" in both SRV1 and BizTalk server side.
On server side, the firewall was configured to allow DCE RPC inbound and outbound. We even disabled the firewall in BizTalk server side. Also no port blocking by network.
We are still doing the troubleshooting now, but my question here is kind of more general: what's the proper configuration of the MSDTC for a clustered SQL Server?
For now, there MIGHT be a workaround by setting the UseAmbientTransaction property to false.
Off course, the MSDTC issue is your main concern :)
Are you sure you checked the Network DTC access checkbox as described here:
http://msdn.microsoft.com/en-us/library/dd897483(v=bts.10).aspx
For more information on troubleshooting these specific issues, please see here: http://msdn.microsoft.com/en-us/library/aa561924(v=bts.10).aspx
This link provides you with some good advice on how to set these properties.
More specifically, if you enable the mutual auth required option, take a look at this paragraph:
If either the Mutual Authentication Required or the Incoming Caller
Authentication Required configuration options are enabled then the
client(s) computer account must be granted the Access this computer
from the network user right. If the computer account for a client
computer is not granted the Access this computer from the network user
right, or is included in the Deny access to this computer from the
network user right, then DTC communication between the client and
server computer will fail.
Typically I always set no auth. It might be worth it to try the setting and see if this makes it work. Also be aware that MSDTC settings need to be the same across your BizTalk and SQL servers, including your MSDTC cluster (IIRC: if you have a windows 2008 R2 msdtc cluster).

NetMsmqBinding WAS service fails to read messages from remote MSMQ queue in a workgroup

We have a service that is hosted in IIS using WAS with the net.msmq binding. The service reads messages from a private transactional MSMQ queue. I need it to work by reading from a queue that is on a different machine to the service. I can get it working if the queue is on the same machine, but not if it is on a different machine.
Environment information
The servers are running Windows Web Server 2008 R2.
The servers are in a workgroup, i.e., they are not part of a domain.
MSMQ has been installed without the directory service integration feature.
I believe that the required Windows features are installed (WCF Non-Http Activation and Http Activation, Message Queuing Server, Multicasting Support, Message Queueing DCOM Proxy, Windows Process Activation Service, .NET Environment, Configuration APIs)
I have made the following registry changes on the machines:
NewRemoteReadServerAllowNoneSecurityClient = 1
NewRemoteReadServerDenyWorkgroupClient = 0
AllowNonauthenticatedRpc = 1
DTC has been enabled, with Network DTC Access, Allow Remote Clients, Allow Inbound, Allow Outbound, No Authentication Required and Enable SNA LU 6.2 Transactions all selected.
Firewall changes have been made.
Service configuration information
We are using netMsmqBinding.
The transport Security Mode of the netMsmqBinding is None.
ExactlyOnce is true
UseActiveDirectory is false
Durable is true
The queue address is net.msmq://the-host-computer-name/private/EmailAsyncService
WCF logging
There is a warning:
Cannot detect if the queue is transactional". The FormatName of the queue in the error is DIRECT=OS:the-host-computer-name\private$\EmailAsyncService
There is then an error:
An error occurred when converting the 'the-host-computer-name\private$\EmailAsyncService' > queue path name to the format name: Unrecognized error -1072824300 (0xc00e0014). All operations on the queued channel failed. Ensure that the queue address is valid. MSMQ must be installed with Active Directory integration enabled and access to it is available.
What I have tried
I can read messages from the remote queue from the machine the service is on if I manually create and use a MessageQueue instance.
I've tried hosting the service as a standalone console application. The error messages are the same.
I have tried disabling the firewalls involved.
I've tried the changes on http://msdn.microsoft.com/en-us/library/ms752246.aspx, which relate to running such services on a computer joined to a workgroup. ("both the activation service and the worker process must be run with a specific user account (must be same for both) and the queue must have ACLs for the specific user account... In workgroup, the service must also run using an unrestricted token.") The user account I'm currently using is Network Service.
Some thoughts
I don't believe that there is a firewall or permissions issue.
Despite the fact that the service configuration has UseActiveDirectory set to false, the queue address of net.msmq://the-host-computer-name/private/EmailAsyncService seems to be getting translated into the-host-computer-name\private$\EmailAsyncService, which AFAIK is a name format that requires lookup via Active Directory.
I'm a little late here, but since you have no other answers, I may still be of help.
You might want to try enabling Directory Service Integration, as I believe you need to muck with certificates to operate in Workgroup Mode.
Also, Juval Lowy's WCF book makes it clear that when you have queued services hosted in WAS you have to name the queue the exact same as the virtual path to your svc file. So if your service is actually hosted at /EmailAsyncService/EmailService.svc then that's precisely what you need to name your queue (without the first slash).