We're noticing a lot of outgoing mail failures lately (within the past week or so) and I'm trying to get to the bottom of it. This is in the latest AMS 5.0.2, server has been restarted, did not have any major config changes recently, etc. and otherwise is working ok.
We have everything set for TLS 1.2, which since AMS 5.x has been ok for outgoing mail (there was a bug somewhere in 4.x that allowed us to update that). However, I'm seeing a handful of recent failures that leads me to believe that support for something has begun to be dropped by other mail servers. on outgoing mails in these cases we'll get undeliverable warnings and then failures with:
Connection to the destination SMTP was closed.
As of this writing, checking the MX servers of various domains which this is an issue for seems to be limited to mimecast.com / ppe-hosted.com / mailspamprotection.com and a few others. Overall this seems to be something like 10-15% of our outgoing mail. What happens is the normal EHLO from AMS occurs, the server responds that STARTTLS is ok, and the 2nd EHLO from AMS occurs, and it just seems to time out after 5 mins. Below is a sample from checkTLS when this occurs (our domain is anonymized)
***********************************
Tue, 30 Aug 2022 10:49:26 -> Success: Action=[Process Mail], Details=[3 KB: Start transfer.]
Tue, 30 Aug 2022 10:49:26 -> Success: Action=[Detect DNS's], Details=[Found 2 entries.]
Tue, 30 Aug 2022 10:49:26 -> Success: Action=[MX Lookup], Details=[DNS=Using automatically detected DNS's, Domain=TestSender.CheckTLS.com: Found 1 records]
Tue, 30 Aug 2022 10:49:26 -> Success: Action=[SMTP Transfer], Details=[Domain=TestSender.CheckTLS.com, Host=ts11-do.CheckTLS.com:25, IP=165.227.190.238: Connection accepted.]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 220 ts11-do.checktls.com ESMTP TestSender Tue, 30 Aug 2022 10:49:26 -0400]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Send Command], Details=[IP=165.227.190.238: EHLO mail.domain.com]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 250-ts11-do.checktls.com Hello domain.com [1.2.3.4], pleased to meet you]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 250-ENHANCEDSTATUSCODES]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 250-8BITMIME]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 250-STARTTLS]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 250 HELP]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Send Command], Details=[IP=165.227.190.238: STARTTLS]
Tue, 30 Aug 2022 10:49:26 -> ***DEBUG*** -> Success: Action=[Recv Response], Details=[IP=165.227.190.238: 220 Ready to start TLS]
Tue, 30 Aug 2022 10:49:26 -> Success: Action=[SMTP Transfer], Details=[Domain=TestSender.CheckTLS.com, Host=ts11-do.CheckTLS.com:25, IP=165.227.190.238: Starting TLS.]
Tue, 30 Aug 2022 10:49:27 -> Success: Action=[SMTP Transfer], Details=[Domain=TestSender.CheckTLS.com, Host=ts11-do.CheckTLS.com:25, IP=165.227.190.238: TLS started.]
Tue, 30 Aug 2022 10:49:27 -> ***DEBUG*** -> Success: Action=[Send Command], Details=[IP=165.227.190.238: EHLO mail.domain.com]
.
.
.
Tue, 30 Aug 2022 10:54:27 -> Failed: Action=[SMTP Transfer], Details=[Domain=TestSender.CheckTLS.com, Host=ts11-do.CheckTLS.com:25, IP=165.227.190.238: Connection closed unexpectedly or forced shutdown.]
***********************************
On successful outgoing mails, the receiving server responds to the 2nd EHLO as expected and it goes through fine. What I don't know is WHY these handful of outgoing mails are failing (I can send fine from, say, Gmail to any of the affected domains fine). There's no specific refusal error codes for this particular problem, it's just that the handshake seems to fail. If I had to guess, I'd say the likeliest culprit was the aging OpenSSL 1.0.2L version AMS is still using; it would be limited to ciphers that over time would be deprecated in newer versions receiving servers would use; even though we're using TLS 1.2, the available cipher list has entirely to do with OpenSSL version as far as what's supported. Going back in the logs further than just the past week or so, I do see more similar failures like this one, but I can't find any notice for the MX servers we're seeing the failures on being like 'as of X date we no longer support TLS 1.2 connections using Y or Z ciphers / protocols'.
So ultimately I guess I'm asking for two things; (1) we need support for openSSL 1.1.1 anyway as that's the current standard pretty much across the board (currently 1.1.1q) and (2) could the debug log for outgoing mail be modified to report the cipher used in the TLS handshake? That may be helpful in pinning these sorts of issues down. I don't KNOW that this is the issue though, because we can receive emails from all of these domains fine; that tells me they have some receiving SMTP requirement that we're either not meeting or not equipped to meet...
This is a major issue for us that's probably not going to get better (it doesn't look like, say, a configuration error within mimecast.com or other host domains); we can use alternate personal emails for now, but this becomes a security issue the longer it lasts...
There seems to have been continued updates to OpenSSL 1.0.2 after 1/1/2020, but they're only available for 'Premium level support' customers and costs $50k+ (https://www.openssl.org/support/contracts.html#premium). They seem to be up to 1.0.2zf, but i'm guessing we'd get 1.1.1 support before code-crafters pays THAT kind of $$?