Closed. This question is off-topic. It is not currently accepting answers.
Want to improve this question? Update the question so it's on-topic for Stack Overflow.
Closed 11 years ago.
Improve this question
I have run Wireshark on the server's computer and I have such a strange transmission:
Client (X: src port 65509) connects to my server (Y: dst port 9999).
1) There is normal TCP handshake
15:47:41.921228 XXX.XXX.XXX.XXX 65509 YYY.YYY.YYY.YYY 9999 65509 > distinct [SYN] Seq=0 Win=8688 Len=0 MSS=1460 WS=0 SACK_PERM=1 TSV=66344090 TSER=0
15:47:41.921308 YYY.YYY.YYY.YYY 9999 XXX.XXX.XXX.XXX 65509 distinct > 65509 [SYN, ACK] Seq=0 Ack=1 Win=8192 Len=0 MSS=1460 SACK_PERM=1 TSV=69754693 TSER=66344090
15:47:42.176823 XXX.XXX.XXX.XXX 65509 YYY.YYY.YYY.YYY 9999 65509 > distinct [ACK] Seq=1 Ack=1 Win=8688 Len=0 TSV=66344350 TSER=69754693
2) Server sends an encryption key to the client and client ACKs receiving it:
15:47:42.180755 YYY.YYY.YYY.YYY 9999 XXX.XXX.XXX.XXX 65509 distinct > 65509 [PSH, ACK] Seq=1 Ack=1 Win=65160 Len=24 TSV=69754719 TSER=66344350
15:47:42.452606 XXX.XXX.XXX.XXX 65509 YYY.YYY.YYY.YYY 9999 65509 > distinct [ACK] Seq=1 Ack=25 Win=8664 Len=0 TSV=66344630 TSER=69754719
3) Suddenly panel Resets the connection for some reason
15:47:42.948618 XXX.XXX.XXX.XXX 65509 YYY.YYY.YYY.YYY 9999 65509 > distinct [RST] Seq=28 Win=0 Len=0
4) But the strange thing to me goes here. Server sends TCP Dup ACK. What can be the reason for that? I thought this message can be sent only after retransmission or sth. I've never seen it to be sent after RST.
15:47:42.948654 YYY.YYY.YYY.YYY 9999 XXX.XXX.XXX.XXX 65509 [TCP Dup ACK 5856#1] distinct > 65509 [ACK] Seq=25 Ack=1 Win=65160 Len=0 TSV=69754796 TSER=66344630**
5) Client sends RST again.
15:47:43.227269 XXX.XXX.XXX.XXX 65509 YYY.YYY.YYY.YYY 9999 65509 > distinct [RST] Seq=1 Win=0 Len=0
Thanks for any suggestions.
The Dup-ACK from server in step(4) is caused by the Seq 28 in step(3):
65509 > distinct [RST] Seq=28 Win=0 Len=0
Because server is expecting Seq#25 but received #28. This happens when seq 25~27 is lost in the network. The Dup-ACK notifies the client to re-transmit lost data before the RST; however, in step(5), we see the client, in response to server's dup-ack, reset again. So client data #25~27 never reached the server and is gone.
You can verify this by doing packet capture on both server and client.
For details, read some TCP re-transmission document.
Related
I'm trying to resolve the following performance problem. There is a database which is synchronously replicated to a remote location via TCP. Currently, everything works great. But it's being migrated to new hardware, and a test load shows that everything slows down roughly by a factor of 2. Basically, the current setup supports sustained transfer rates of 200-300 MB/s whereas the new one gets 100-150MB/s at best, and it's not good enough for us.
There is nothing obviously wrong from the database side. Database instrumentation says that the source database is busy sending data on the network (by large chunks, tens of MB at a time), and the destination one is busy receiving it on the network. So I'm looking at the TCP packet capture in Wireshark and I notice a few things that look a bit odd in the new setup -- see a sample below.
AFAIK the window scaling factor is 7 for this conversation so receive window gets a x128 factor which means most of the time it's not a limiting factor.
First of all, most of the time there is only 1 packet in flight per every ACK which is not the case for the existing setup where I can see multiple bursts of tens of outgoing packets. Is this Nagle algorithm in action or something else? It's supposed to be off (there is a tcp nodelay option on the application level) but it's still a bit suspicious.
Second, I don't understand the timings. It's almost as if something is controlling the rate of outgoing packets and keeps it roughly to 1 packet every 50 us (sometimes a bit more, sometimes a bit less), rather than leaving within a couple of microseconds after getting an ACK. Could there be some sort of burst control in place or am I imagining things?
Third, segment size. Most of segments are 8kB as compared to existing setup where they are 64kB. We experimented with the application settings but we can't seem to be able to make a difference -- 64kB segments are there, but they are rare. Is there a way in Linux to strongly encourage larger segments?
36 2022-09-01 15:02:45.267111 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162935757 Ack=3197136358 Win=6166 Len=8156
37 2022-09-01 15:02:45.267115 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162943913 Win=24525 Len=0
38 2022-09-01 15:02:45.267162 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162943913 Ack=3197136358 Win=6166 Len=8156
39 2022-09-01 15:02:45.267166 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162952069 Win=24525 Len=0
40 2022-09-01 15:02:45.267212 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162952069 Ack=3197136358 Win=6166 Len=8156
41 2022-09-01 15:02:45.267215 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162960225 Win=24525 Len=0
42 2022-09-01 15:02:45.267261 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162960225 Ack=3197136358 Win=6166 Len=8156
43 2022-09-01 15:02:45.267265 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162968381 Win=24525 Len=0
44 2022-09-01 15:02:45.267313 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162968381 Ack=3197136358 Win=6166 Len=8156
45 2022-09-01 15:02:45.267318 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162976537 Win=24525 Len=0
46 2022-09-01 15:02:45.267342 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162976537 Ack=3197136358 Win=6166 Len=8156
47 2022-09-01 15:02:45.267346 192.168.240.115 192.168.240.122 TCP 54 1600 → 45508 [ACK] Seq=3197136358 Ack=2162984693 Win=24525 Len=0
48 2022-09-01 15:02:45.267391 192.168.240.122 192.168.240.115 TCP 8210 45508 → 1600 [PSH, ACK] Seq=2162984693 Ack=3197136358 Win=6166 Len=8156
Any suggestions are greatly appreciated.
Thanks!
Update: I've shared packet capture files on sender and receiver sides for both current setup and old setup at https://drive.google.com/drive/folders/1ktBDjRHOUCfia1kTfdVIQdS-Q1k4B3qn
Update2: I've written a blog entry about this investigation for those interested: https://savvinov.com/2022/09/20/use-of-packet-capture-and-other-advanced-tools-in-network-issues-troubleshooting/
Best regards,
Nikolai
While I couldn't find answers to all of my questions, I found the ones that mattered most.
It turned out that the TCP stack was sending data in 8kB segments because the "application" send it that way to it. By "application" I mean the replication software (Oracle Data Guard) that picked up a stream of database changes on the source database and wrote it to the remote standby.
So eventually I traced tcp_sendmsg using BCC trace.py utility and found that its segment size argument was about 8kB (8156 bytes to be more specific). Then I traced the network stack on the "application" level, forcing the connection to be re-established during the tracing, and it turned out to be that the parameter controlling the size of the transmission (SDU or session data unit) was supposed to be 64kB in settings, but in fact the new connection was using a smaller value, 8kB.
Further research showed that there was a number of oddities around the way this parameter is set, and also that the documentation around it was inaccurate.
When the correct way to set the value was found by trial and error, the throughput became immediately much better and all the bottlenecks that bothered us disappeared.
Best regards,
Nikolai
In a google compute node, when I run this command curl -v https://www1.nseindia.com/, the command gets stuck immediately after the TLS handshake.
* Expire in 50 ms for 1 (transfer 0x562c94210f50)
* Trying 23.199.139.58...
* TCP_NODELAY set
* Expire in 200 ms for 4 (transfer 0x562c94210f50)
* Connected to www1.nseindia.com (23.199.139.58) port 443 (#0)
* ALPN, offering h2
* ALPN, offering http/1.1
* successfully set certificate verify locations:
* CAfile: none
CApath: /etc/ssl/certs
* TLSv1.3 (OUT), TLS handshake, Client hello (1):
* TLSv1.3 (IN), TLS handshake, Server hello (2):
* TLSv1.3 (IN), TLS handshake, Encrypted Extensions (8):
* TLSv1.3 (IN), TLS handshake, Certificate (11):
* TLSv1.3 (IN), TLS handshake, CERT verify (15):
* TLSv1.3 (IN), TLS handshake, Finished (20):
* TLSv1.3 (OUT), TLS change cipher, Change cipher spec (1):
* TLSv1.3 (OUT), TLS handshake, Finished (20):
* SSL connection using TLSv1.3 / TLS_AES_256_GCM_SHA384
* ALPN, server accepted to use http/1.1
* Server certificate:
* subject: C=IN; ST=Maharashtra; L=Mumbai; O=National Stock Exchange of India Ltd.; CN=www.nseindia.com
* start date: Sep 2 00:00:00 2020 GMT
* expire date: Dec 12 12:00:00 2020 GMT
* subjectAltName: host "www1.nseindia.com" matched cert's "www1.nseindia.com"
* issuer: C=US; O=DigiCert Inc; OU=www.digicert.com; CN=GeoTrust RSA CA 2018
* SSL certificate verify ok.
> GET / HTTP/1.1
> Host: www1.nseindia.com
> User-Agent: curl/7.64.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
The tshark clearly indicates that the TLS handshake had completed, and the curl client did send the HTTP GET request, after which there is no response from the server.
Running as user "root" and group "root". This could be dangerous.
Capturing on 'ens4'
1 0.000000000 10.148.0.2 → 23.199.139.58 TCP 74 51830 → 443 [SYN] Seq=0 Win=65320 Len=0 MSS=1420 SACK_PERM=1 TSval=2896975156 TSecr=0 WS=128
2 0.014462698 23.199.139.58 → 10.148.0.2 TCP 74 443 → 51830 [SYN, ACK] Seq=0 Ack=1 Win=28960 Len=0 MSS=1460 SACK_PERM=1 TSval=979210844 TSecr=2896975156 WS=12
8
3 0.014506462 10.148.0.2 → 23.199.139.58 TCP 66 51830 → 443 [ACK] Seq=1 Ack=1 Win=65408 Len=0 TSval=2896975170 TSecr=979210844
4 0.015855789 10.148.0.2 → 23.199.139.58 TLSv1 583 Client Hello
5 0.029133295 23.199.139.58 → 10.148.0.2 TCP 66 443 → 51830 [ACK] Seq=1 Ack=518 Win=30080 Len=0 TSval=979210859 TSecr=2896975171
6 0.029490908 23.199.139.58 → 10.148.0.2 TLSv1.3 4162 Server Hello, Change Cipher Spec, Application Data
7 0.029515497 10.148.0.2 → 23.199.139.58 TCP 66 51830 → 443 [ACK] Seq=518 Ack=4097 Win=62848 Len=0 TSval=2896975185 TSecr=979210859
8 0.031222072 23.199.139.58 → 10.148.0.2 TCP 1474 443 → 51830 [ACK] Seq=4097 Ack=518 Win=30080 Len=1408 TSval=979210861 TSecr=2896975171 [TCP segment of a rea
ssembled PDU]
9 0.031238234 10.148.0.2 → 23.199.139.58 TCP 66 51830 → 443 [ACK] Seq=518 Ack=5505 Win=64128 Len=0 TSval=2896975187 TSecr=979210861
10 0.042769026 23.199.139.58 → 10.148.0.2 TLSv1.3 858 Application Data, Application Data, Application Data
11 0.042797990 10.148.0.2 → 23.199.139.58 TCP 66 51830 → 443 [ACK] Seq=518 Ack=6297 Win=64128 Len=0 TSval=2896975198 TSecr=979210872
12 0.043898975 10.148.0.2 → 23.199.139.58 TLSv1.3 146 Change Cipher Spec, Application Data
13 0.044187939 10.148.0.2 → 23.199.139.58 TLSv1.3 169 Application Data
14 0.057149099 23.199.139.58 → 10.148.0.2 TLSv1.3 353 Application Data
15 0.057313736 23.199.139.58 → 10.148.0.2 TLSv1.3 353 Application Data
16 0.057462731 10.148.0.2 → 23.199.139.58 TCP 66 51830 → 443 [ACK] Seq=701 Ack=6871 Win=64128 Len=0 TSval=2896975213 TSecr=979210887
17 0.274408940 10.148.0.2 → 23.199.139.58 TCP 169 [TCP Retransmission] 51830 → 443 [PSH, ACK] Seq=598 Ack=6871 Win=64128 Len=103 TSval=2896975430 TSecr=9792108
87
18 0.494346208 10.148.0.2 → 23.199.139.58 TCP 169 [TCP Retransmission] 51830 → 443 [PSH, ACK] Seq=598 Ack=6871 Win=64128 Len=103 TSval=2896975650 TSecr=9792108
87
19 0.938332610 10.148.0.2 → 23.199.139.58 TCP 169 [TCP Retransmission] 51830 → 443 [PSH, ACK] Seq=598 Ack=6871 Win=64128 Len=103 TSval=2896976094 TSecr=9792108
87
20 1.834328400 10.148.0.2 → 23.199.139.58 TCP 169 [TCP Retransmission] 51830 → 443 [PSH, ACK] Seq=598 Ack=6871 Win=64128 Len=103 TSval=2896976990 TSecr=9792108
After kill the curl client, I can also see a [FIN, ACK], but while the client is stuck there is no ingress response from the server.
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
We can rule out the below logs which come from curl, because I have tried a GET on another site which support only HTTP1.1 with TLS1.2 and it worked and logged the same lines as well. This issue is only happening for this one site, i.e. https://www1.nseindia.com/ and is working fine for all the other sites which I have tried.
My questions are:
Why the communication channel is not working after the TLS handshake? If it was an issue with the firewall, my expectation was that the TLS channel would never get establised.
What is the root cause of the curl command to this site not working?
How can I get the curl command to this site working?
Note:
This is working fine on every other non-google compute node which I have tested on.
I really hope this is just a firewall issue. Do ask for more details if that can help anyone answer this.
I have performed some curl test from different environments, but the result is always the same, the curl stuck without response.
I have tried from GCP (windows and linux), ubuntu PC, Windows PC, also ask for a test with a friend of mine far from my end, and the result was the same.
So, according to your first question; it makes me think that there is something on the url host, that after a TLS handshake is completed; "blocks" or "stops" the communication.
I'm not pretty sure if could it be the certificate or the server itself. If is it possible, could you please share a curl when it shows that the test was completed? Also it would be helpful if share the location of the test.
Your second and third question is that should it being working just performing the curl, let me review if I can think in another test or something that could help us to find the answers.
#blueboy1115
Thanks a lot for checking this.
Based on your testing, I guess you're right, and the server is blocking the communication if the IP is not based probably in India. My google cloud instances are not from India, because the google cloud doesn't have any resources to allocate for a new instance in India.
This site is accessible on any browser and python scripts which I run on my laptop in India. But the same python scripts get stuck when I execute on the google VM instance hosted in America, Singapore and Sydney.
In the google instance:
When I change the URL from https://www1.nseindia.com/ to https://www.nseindia.com/, then the request becomes HTTP2, and fails gracefully.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x55e45a305f50)
> GET / HTTP/2
> Host: www.nseindia.com
> User-Agent: curl/7.64.0
> Accept: */*
>
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* TLSv1.3 (IN), TLS handshake, Newsession Ticket (4):
* old SSL session ID is stale, removing
* Connection state changed (MAX_CONCURRENT_STREAMS == 100)!
< HTTP/2 403
< server: AkamaiGHost
< mime-version: 1.0
< content-type: text/html
< content-length: 265
< expires: Thu, 01 Oct 2020 05:10:13 GMT
< date: Thu, 01 Oct 2020 05:10:13 GMT
<
<HTML><HEAD>
<TITLE>Access Denied</TITLE>
</HEAD><BODY>
<H1>Access Denied</H1>
You don't have permission to access "http://www.nseindia.com/" on this server.<P>
Reference #18.5a0a0f17.1601529013.19fc9a7
</BODY>
</HTML>
-In my Windows laptop in India
Interestingly on my Windows laptop on Indian ISP, where the site is working on the browser, and accessible on python scripts with the urllib library, the curl command fails for both www1.nseindia.com and www.nseindia.com after a timeout, and both attempt http1.1 because my Windows curl doesn't support HTTP2.
I have concluded that the server supports only HTTP2 and rejects IP address which are not hosted in India after the TLS handshake. This is the reason why it's working on my local laptop, via browser and via python with urllib.
If anyone has any other conclusions please do share.
From the results/logs posted it looks like the hosting server irrespective of the platform (gcp, etc..) the requests sent are being accepted / rejected basis on the region, generally this is done to prevent data/web scraping & this may be one of the security mechanism implemented at the remote server side(nseindia). That would be the reason curl -v https://www1.nseindia.com is going into the stale session after the TLS Handshake is done.
In my case I had to disable ipv6 on the proxying server, e.g. on Ubuntu:
sysctl -w net.ipv6.conf.all.disable_ipv6=1
sysctl -w net.ipv6.conf.default.disable_ipv6=1
Is it possible to capture incoming REST API requests to an tomcat server, in order to validate whether external clients are using proper credentials. 401 responses are produced but we need to prove that the REST API is not the problem but rather the requests.
I successfully installed wireshark and based on suggestions used tshark to try and capture incoming packets.
tshark -D
1. usbmon1 (USB bus number 1)
2. eth2
3. any (Pseudo-device that captures on all interfaces)
4. lo
I would assume http requests would be 'tcp'? correct? Then why does it not show here? I tried the following command found online:
tshark 'tcp port 80 and (((ip[2:2] - ((ip[0]&0xf)<<2)) - ((tcp[12]&0xf0)>>2)) != 0)' -R 'http.request.method == "GET" || http.request.method == "HEAD"'
But this results in an error;
That string isn't a valid capture filter (USB link-layer type filtering
not implemented).
See the User's Guide for a description of the capture filter syntax.
0 packets captured
I know the specific incoming requests url I am expecting and thought I could filter with that https://xxxxxxxxxxxxxxxx/termAPI/list
Really appreciate some help here.
EDIT:
Tried and tested the following:
tshark -i 2 -f 'port 80'
then ran a sample API Request and got the following captured:
Capturing on eth2
0.000000000 192.1xx.xxx -> 192.168.cc.xxxx TCP 66 49330 > http [SYN]
Seq=0 Win=8192 Len=0 MSS=1400 WS=4 SACK_PERM=1
0.000146849 192.168.cc.xxxx -> 192.1xx.xxx TCP 66 http > 49330 [SYN, ACK]
Seq=0 Ack=1 Win=14100 Len=0 MSS=1410 SACK_PERM=1 WS=128
0.005808528 192.1xx.xxx -> 192.168.cc.xxxx TCP 54 49330 > http [ACK]
Seq=1 Ack=1 Win=65800 Len=0
0.031745954 192.1xx.xxx -> 192.168.cc.xxxx HTTP 220
GET /termsapi/google/search/main/rules/active HTTP/1.1
0.031845414 192.168.cc.xxxx -> 192.1xx.xxx TCP 54 http > 49330 [ACK] Seq=1
Ack=167 Win=15232 Len=0
0.063554179 192.168.cc.xxxx -> 192.1xx.xxx TCP 2854 [TCP segment of a
reassembled PDU]
0.063568626 192.168.cc.xxxx -> 192.1xx.xxx TCP 2854 [TCP segment of a
reassembled PDU]
0.063572832 192.168.cc.xxxx -> 192.1xx.xxx HTTP 695 HTTP/1.1 200
OK (application/json)
0.064066260 192.168.cc.xxxx -> 192.1xx.xxx TCP 54 http > 49330 [FIN, ACK]
Seq=6242 Ack=167 Win=15232 Len=0
0.075055934 192.1xx.xxx -> 192.168.cc.xxxx TCP 54 49330 > http [ACK]
Seq=167 Ack=2801 Win=65800 Len=0
0.075067927 192.1xx.xxx -> 192.168.cc.xxxx TCP 54 49330 > http [ACK]
Seq=167 Ack=6243 Win=65800 Len=0
0.075095146 192.1xx.xxx -> 192.168.cc.xxxx TCP 54 49330 > http [FIN, ACK]
Seq=167 Ack=6243 Win=65800 Len=0
0.075098758 192.168.cc.xxxx -> 192.1xx.xxx TCP 54 http > 49330 [ACK]
Seq=6243 Ack=168 Win=15232 Len=0
but I cannot see credentials
According to the man page tshark -D
Print a list of the interfaces on which TShark can capture, and exit.
You then have to choose one. For your case, you probably want to listen to eth2.
You can listen all traffic on eth2 with:
tshark -ieth2
If you want to capture only GET requests, you can use a capture filter expression, from the documentation :
tshark -ieth2 "port 80 and tcp[((tcp[12:1] & 0xf0) >> 2):4] = 0x47455420"
You will then see all GET requests coming to your server.
== Edit
If you want to see all details (credentials,...) of your packets, you can ask tshark to output packets in Packet Details Markup Language by adding the -T pdml option.
I am trying to configure ColdFusion to send emails using 1&1's servers (smtp.1and1.com) and even though I have set the username and password it keeps failing.
This is what I've done so far:
Set outgoing server to smtp.1and1.com
set username and password
set port to 587
selected Use TLS checkbox
selected Verify Settings box
when I click Save I get the message "Connection Verification Failed!"
In the ColdFusion log files in the mail.log I see this error:
"Error","scheduler-1","03/22/16","19:39:21",,"Can't send command to
SMTP host"
I ran WireShark and captured some packets and it seems it does connect to the server, some communication goes back and forth, and then it aborts.
Below is a sample of the capture:
No Time Protocol Length Info
1 0.000000 TCP 66 49858 ? 587 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
2 0.000567 TCP 66 587 ? 49858 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=512
3 0.000611 TCP 54 49858 ? 587 [ACK] Seq=1 Ack=1 Win=131328 Len=0
4 0.007028 SMTP 112 S: 220 perfora.net (mreueus002) Nemesis ESMTP Service ready
5 0.015100 SMTP 70 C: EHLO vm229CAC8
6 0.015556 TCP 60 587 ? 49858 [ACK] Seq=59 Ack=17 Win=29696 Len=0
7 0.015697 SMTP 159 S: 250 perfora.net Hello vm229CAC8 [**.**.**.**] | 250 SIZE 69920427 | 250 AUTH LOGIN PLAIN | 250 STARTTLS
8 0.019485 SMTP 64 C: STARTTLS
9 0.021416 SMTP 62 S: 220 OK
10 0.058490 TLSv1 132 Client Hello
11 0.059244 TLSv1 1514 Server Hello
12 0.059246 TCP 1514 [TCP segment of a reassembled PDU]
13 0.059283 TCP 54 49858 ? 587 [ACK] Seq=105 Ack=3092 Win=131328 Len=0
14 0.059308 TLSv1 710 Certificate
15 0.070314 TLSv1 61 Alert (Level: Fatal, Description: Certificate Unknown)
16 0.070368 TCP 54 49858 ? 587 [FIN, ACK] Seq=112 Ack=3748 Win=130560 Len=0
17 0.070858 TLSv1 61 Alert (Level: Fatal, Description: Internal Error)
18 0.070905 TCP 54 49858 ? 587 [RST, ACK] Seq=113 Ack=3755 Win=0 Len=0
19 0.071198 TCP 60 587 ? 49858 [FIN, ACK] Seq=3755 Ack=113 Win=29696 Len=0
All of which makes me think that there is something with the certificate (since it aborts before it even bothers with the username and password).
I've saved the 3 certificates from packet 14 and looked at them and they all seem fine - validity is OK, Thawte is the root CA - checked and confirmed the included one is OK, etc.
What am I missing? And are there any other log files that might shed some more light on this issue?
Thanks
I found it. It was the certificate.
ColdFusion runs on top of Java. Java has its own set of trusted root certificates. This server's root certificate wasn't there (hence why it wasn't trusted).
Solution essentially boiled down to:
Save the root certificate in a file
import it into the ColdFusion's java run-machine' trusted root certificates
restart ColdFusion so that it picks up the changes
The first step was easy - I expanded the 14th packet within WireShark, there were 3 certificates in it, saved them as 1.cer 2.cer and 3.cer files (it was 3.cer which had just the root one). I guess I could've visited any of 1&1's web pages via https and grabbed it, but wasn't sure if they'll use the same root CA. So extracting it from the actual packet seemed like the safer option.
ColdFusion was installed in C:\ColdFusion\ and to find out which Java runtime it starts I looked under C:\ColdFusion\bin\cfstart.bin which had was referring to ..\runtime\bin\jrun -start coldfusion.
Its Java run-machine had the certificates stored in C:\ColdFusion\runtime\jre\lib\security\cacerts
What remained was how to import it in that keystore - I used portecle as suggested here.
After restarting ColdFusion and asking it politely to verify the settings it confirmed them and I saw the below log in WireShark:
No. Time Protocol Length Info
104 3.895581 TCP 66 55157 ? 587 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=256 SACK_PERM=1
105 3.896180 TCP 66 587 ? 55157 [SYN, ACK] Seq=0 Ack=1 Win=29200 Len=0 MSS=1460 SACK_PERM=1 WS=512
106 3.896229 TCP 54 55157 ? 587 [ACK] Seq=1 Ack=1 Win=131328 Len=0
107 3.902608 SMTP 112 S: 220 perfora.net (mreueus003) Nemesis ESMTP Service ready
108 3.903791 SMTP 70 C: EHLO vm229CAC8
109 3.904271 TCP 60 587 ? 55157 [ACK] Seq=59 Ack=17 Win=29696 Len=0
110 3.904390 SMTP 159 S: 250 perfora.net Hello vm229CAC8 [**.**.**.**] | 250 SIZE 69920427 | 250 AUTH LOGIN PLAIN | 250 STARTTLS
111 3.904532 SMTP 64 C: STARTTLS
112 3.906347 SMTP 62 S: 220 OK
118 4.112009 TCP 62 [TCP Retransmission] 587 ? 55157 [PSH, ACK] Seq=164 Ack=27 Win=29696 Len=8
119 4.112057 TCP 66 55157 ? 587 [ACK] Seq=27 Ack=172 Win=131072 Len=0 SLE=164 SRE=172
120 4.115457 TLSv1 132 Client Hello
121 4.116154 TLSv1 1514 Server Hello
122 4.116157 TCP 1514 [TCP segment of a reassembled PDU]
123 4.116158 TLSv1 710 Certificate
124 4.116201 TCP 54 55157 ? 587 [ACK] Seq=105 Ack=3748 Win=131328 Len=0
125 4.156467 TLSv1 321 Client Key Exchange
127 4.196201 TCP 60 587 ? 55157 [ACK] Seq=3748 Ack=372 Win=30720 Len=0
128 4.196237 TLSv1 97 Change Cipher Spec, Encrypted Handshake Message
129 4.196799 TCP 60 587 ? 55157 [ACK] Seq=3748 Ack=415 Win=30720 Len=0
130 4.197005 TLSv1 97 Change Cipher Spec, Encrypted Handshake Message
131 4.197742 TLSv1 91 Application Data
132 4.198262 TLSv1 166 Application Data
133 4.198550 TLSv1 87 Application Data
134 4.199201 TLSv1 93 Application Data
135 4.199677 TLSv1 117 Application Data
136 4.200122 TLSv1 93 Application Data
137 4.200345 TLSv1 101 Application Data
138 4.240137 TCP 60 587 ? 55157 [ACK] Seq=3981 Ack=595 Win=30720 Len=0
143 4.448738 TLSv1 105 Application Data
154 4.652126 TCP 105 [TCP Retransmission] 587 ? 55157 [PSH, ACK] Seq=3981 Ack=595 Win=30720 Len=51
155 4.652153 TCP 66 55157 ? 587 [ACK] Seq=595 Ack=4032 Win=131072 Len=0 SLE=3981 SRE=4032
and also tried sending a few test emails and everything worked as expected.
Thanks for everyone's help and suggestions! :)
p.s. And I found also the backup option. Turns out 1&1 does support TLS but does not require it. Plain old SMTP with no TLS worked just fine on port 587.
I discovered this accidentally - it is probably a bug in ColdFusion (version 9 in my case). In ColdFusion's Server Settings > Mail > Undelivered Mail I told it to resend a failed email. And it did - but without attempting the TLS part.
My colleagues are attempting to connect BizTalk 2006 R2 via DB2/MVS adapter to a database hosted on z/OS mainframe. When testing the connecting settings, they are getting the following error
Could not connect to data source 'New Data Source':
The network connection was terminated because the host failed to send any data.
SQLSTATE: 08S01, SQLCODE: -605
When putting the settings in a regular connection string and opening with .NET code, that is fine. I am new to BizTalk and DB2. Can anybody suggest what to look out for when this error surfaces?
24 Aug 08:
Well, if normal .NET code with a regular DB2 connection string is used, the connection can be made and queries submitted. What this DB2 adapter is reporting is it cannot even make a proper connection handshake, let alone submitting queries. I am unsure of what are the actual mechanisms involved to make a DB2 connection happen.
25 Aug 08:
According to this MSDN forums posting, it seems to be a login issue.
I have seen that and that is not the case here. If we put the user name as the Package Collection it still hits the same problem.
26 Aug 08:
Because of the scarcity of information regarding connecting to mainframe DB2 databases from Microsoft products, I undertook the task of inspecting raw network packets to get a clue what is going on between the .NET DB2 provider's connection (which works) and the BizTalk 2006 DB2 adapter (which bombs). I observed DB2 traffic is done using the DRDA protocol. And ultimately concluded the BizTalk adapter method fails because of what's recorded in the server's reply SECCHKRM packet
DRDA (Security Check)
DDM (SECCHKRM)
Length: 55
Magic: 0xd0
Format: 0x02
0... = Reserved: Not set
.0.. = Chained: Not set
..0. = Continue: Not set
...0 = Same correlation: Not set
DSS type: RPYDSS (2)
CorrelId: 0
Length2: 49
Code point: SECCHKRM (0x1219)
Parameter (Severity Code)
Length: 6
Code point: SVRCOD (0x1149)
Data (ASCII):
Data (EBCDIC):
Parameter (Security Check Code)
Length: 5
Code point: SECCHKCD (0x11a4)
Data (ASCII):
Data (EBCDIC):
Parameter (Server Diagnostic Information)
Length: 34
Code point: SRVDGN (0x1153)
Data (ASCII): \304\331\304\301#\301\331z#\301\344\343\310\305\325\343\311\303\301\343\311\326\325#\206\201\211\223\205\204
Data (EBCDIC): DRDA AR: AUTHENTICATION failed
Why the same credentials fails here while succeeding in the .NET provider is beyond me. Right now, what I can observe is a marked difference between each method when it comes to the sequence of packets transferred.
.NET DB2 provider
No. Time Source Destination Protocol Info
1 0.000000 [client IP] [DB2 server IP] TCP kpop > 50000 [SYN] Seq=0 Win=65535 Len=0 MSS=1460 WS=1
2 0.000399 [DB2 server IP] [client IP] TCP 50000 > kpop [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0
3 0.000414 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=1 Ack=1 Win=65536 [TCP CHECKSUM INCORRECT] Len=0
4 0.000532 [client IP] [DB2 server IP] DRDA EXCSAT | ACCSEC
5 0.038162 [DB2 server IP] [client IP] DRDA EXCSATRD | ACCSECRD
6 0.041829 [client IP] [DB2 server IP] DRDA ACCSEC | SECCHK | ACCRDB
7 0.083626 [DB2 server IP] [client IP] TCP 50000 > kpop [ACK] Seq=108 Ack=542 Win=65535 Len=0
8 0.190534 [DB2 server IP] [client IP] DRDA ACCSECRD | SECCHKRM | ACCRDBRM | SQLCARD
9 0.199776 [client IP] [DB2 server IP] DRDA PRPSQLSTT | SQLATTR | SQLSTT | OPNQRY
10 0.293307 [DB2 server IP] [client IP] TCP [TCP segment of a reassembled PDU]
11 0.293359 [DB2 server IP] [client IP] TCP [TCP segment of a reassembled PDU]
12 0.293377 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=870 Ack=1444 Win=64092 [TCP CHECKSUM INCORRECT] Len=0
13 0.293404 [DB2 server IP] [client IP] TCP [TCP segment of a reassembled PDU]
14 0.293452 [DB2 server IP] [client IP] TCP [TCP segment of a reassembled PDU]
15 0.293461 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=870 Ack=2516 Win=65536 [TCP CHECKSUM INCORRECT] Len=0
16 0.293855 [DB2 server IP] [client IP] TCP [TCP segment of a reassembled PDU]
17 0.293908 [DB2 server IP] [client IP] DRDA SQLDARD
18 0.293918 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=870 Ack=3588 Win=64464 [TCP CHECKSUM INCORRECT] Len=0
19 0.293957 [DB2 server IP] [client IP] DRDA QRYDSC
20 0.294008 [DB2 server IP] [client IP] DRDA QRYDTA
21 0.294017 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=870 Ack=4660 Win=65536 [TCP CHECKSUM INCORRECT] Len=0
22 0.294023 [DB2 server IP] [client IP] DRDA SQLCARD
23 0.295346 [client IP] [DB2 server IP] DRDA RDBCMM
24 0.297868 [DB2 server IP] [client IP] DRDA ENDUOWRM | SQLCARD
25 0.421392 [client IP] [DB2 server IP] DRDA PRPSQLSTT | SQLATTR | SQLSTT | OPNQRY
26 0.456504 [DB2 server IP] [client IP] DRDA SQLDARD | OPNQRYRM | TYPDEFNAM | QRYDSC | QRYDTA | ENDQRYRM | TYPDEFNAM | SQLCARD
27 0.456756 [client IP] [DB2 server IP] DRDA RDBCMM
28 0.488311 [DB2 server IP] [client IP] DRDA ENDUOWRM | SQLCARD
29 0.498806 [client IP] [DB2 server IP] DRDA PRPSQLSTT | SQLATTR | SQLSTT | OPNQRY
30 0.630477 [DB2 server IP] [client IP] TCP 50000 > kpop [ACK] Seq=5157 Ack=1579 Win=65171 Len=0
31 0.788165 [DB2 server IP] [client IP] DRDA SQLDARD | OPNQRYRM | TYPDEFNAM | QRYDSC | QRYDTA
32 0.788203 [DB2 server IP] [client IP] DRDA ENDQRYRM
33 0.788225 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=1579 Ack=5815 Win=64380 [TCP CHECKSUM INCORRECT] Len=0
34 0.788648 [client IP] [DB2 server IP] DRDA RDBCMM
35 0.795951 [DB2 server IP] [client IP] DRDA ENDUOWRM | SQLCARD
36 0.807365 [client IP] [DB2 server IP] DRDA PRPSQLSTT | SQLATTR | SQLSTT | OPNQRY
37 0.838046 [DB2 server IP] [client IP] DRDA SQLDARD | OPNQRYRM | TYPDEFNAM | QRYDSC | QRYDTA | ENDQRYRM | TYPDEFNAM | SQLCARD
38 0.838328 [client IP] [DB2 server IP] DRDA RDBCMM
39 0.841866 [DB2 server IP] [client IP] DRDA ENDUOWRM | SQLCARD
40 0.973506 [client IP] [DB2 server IP] TCP kpop > 50000 [ACK] Seq=1906 Ack=6304 Win=65482 [TCP CHECKSUM INCORRECT] Len=0
BizTalk DB2 adapter
No. Time Source Destination Protocol Info
1 0.000000 [client IP] [DB2 server IP] TCP 28165 > 50000 [SYN] Seq=0 Win=8192 Len=0 MSS=1460 WS=8
2 0.002587 [DB2 server IP] [client IP] TCP 50000 > 28165 [SYN, ACK] Seq=0 Ack=1 Win=16384 Len=0 MSS=1460 WS=0
3 0.010146 [client IP] [DB2 server IP] TCP 28165 > 50000 [ACK] Seq=1 Ack=1 Win=65536 Len=0
4 0.019698 [client IP] [DB2 server IP] DRDA EXCSAT
5 0.020849 [DB2 server IP] [client IP] DRDA EXCSATRD
6 0.034699 [client IP] [DB2 server IP] DRDA ACCSEC
7 0.036584 [DB2 server IP] [client IP] DRDA ACCSECRD
8 0.042031 [client IP] [DB2 server IP] DRDA SECCHK
9 0.046350 [DB2 server IP] [client IP] DRDA SECCHKRM
10 0.046642 [DB2 server IP] [client IP] TCP 50000 > 28165 [FIN, ACK] Seq=160 Ack=200 Win=65336 Len=0
11 0.053787 [client IP] [DB2 server IP] TCP 28165 > 50000 [ACK] Seq=200 Ack=161 Win=65536 Len=0
12 0.056891 [client IP] [DB2 server IP] DRDA ACCRDB
13 0.058084 [DB2 server IP] [client IP] TCP 50000 > 28165 [RST, ACK] Seq=161 Ack=295 Win=0 Len=0
It is interesting to witness the .NET provider issue out various DRDA protocol packets within in a single TCP segment. The BizTalk adapter on the other hand, places only one protocol packet per TCP segment. I do not know why this is so. However, I at the moment think that is a red herring and the true difference causing the failure in authentication is in the DRDA data exchange. I do not know the DRDA protocol so will have to study it before I can make more sense of it.
18 Sep 08:
At this stage the problem is still not solved, as getting cooperation from the DB2 DBA team and help from Microsoft have been met with many obstacles.
What I do want to report is, I have observed perhaps one crucial difference between all the cases of successful connection versus the failed attempt:
The BizTalk DB2 adapter is underlyingly using Microsoft ODBC Driver for DB2. The other software tests that succeed make use of IBM DB2 ODBC DRIVER or IBM DB2 ODBC DRIVER – IBMCL1. The IBM driver's parameter configuration is different from Microsoft's driver. But we do not see any obviously critical difference that may lead to a failed authentication for the Microsoft driver.
Why, it certainly took Microsoft long enough to explicitly confirm this:
proxy connections via DB2Connect is not supported by BizTalk DB2 Adapter
Since our customer's policy is to only access DB2 databases via DB2Connect, the adapter is out of the question.
MORE BACKGROUND INFO
The reason why the DB2 Adapter only works for a direct connection to a z/OS mainframe host, is due to legal restrictions. Technically it is possible to work a connection with DB2Connect, but IBM has made it a priorietary node and prevented other parties from legally establishing the correct DRDA sequence to connect to it.
I've never used this adapter but myself, so I'm guessing, but maybe it's to do with the account that BizTalk is using to connect or your ports are not configured correctly.
According to this MSDN forums posting, it seems to be a login issue.