The mysterious case of broken SSH client (“connection reset by peer”)

Update: from the info I've gathered, this is most probably a problem with some Cisco IDS/DPI is running on the ethernet equipment. Workaround is available in the content below, I still don't know what's the real solution here (Cisco equipment config? update Cisco firmware?)

---

Starting with 5.7p1, ssh client on specific environments fails connecting to specific (usually old versioned) servers. I reproduced it on a particular network, while trying to connect using new ssh client (5.8p1, Ubuntu 11.04) to an old server (default SSH server on RedHat 5.4).

Motivation

This issue is around for quite a while, but is very tricky to reproduce or understand. What bothered me most is that many people reported it to different forums, each posting only a few (different) pieces of the puzzle. So my motivation here is to try and summarize the relevant info from multiple places. I'll do my best to update this post when I hear something new.

Complete Fact list

Problem is present on 5.7p1, 5.8p1.
Exact same version (e.g. 5.7p1) works on some environments, and fails on others. (My definition to "environment": particular client machine, particular server machine, on a particular network)
On the "bad" environments, the problem is 100% reproducible. SSH dies immediately right after connection with the "connection reset by peer" message. Running ssh -vvv don't shed too much light on this problem (see here).
Workarounds: On the "bad" environments, the two following workarounds are known to always work around the problem:
1. Shortening the cipher list ('ssh -c aes256-ctr').
2. Shortening the HostKeyAlgorithms list
On the "bad" environments, enlarging the cipher list manually using '-c aes256-ctr,,,,,,' with enough commas, triggers the problem. It's easy to find a deterministic (per-environment) threshold for the length of the cipher list. More than this threshold - breaks ssh client, less - works perfectly.
On "bad" environments, downgrading to an older release (5.5p1) resolves the problem.
I (among many others) reported the problem to openssh-unix-dev mailing list, but the openssh developers couldn't reproduce the problem on their place, and therefore couldn't yet investigate it properly.

Guesses

It occurs on networks with Cisco equipment, possibly some "smart" Deep Packet Inspection filter ruins specific packets.
It has to do with the packet size of one of the "handshaking" packets. Setting a short cipher list/HostKeyAlgorithms list simply shortens the packet size below some threshold.
The buggy behavior HAS something to do with some change in OpenSSH, probably starting in 5.7p1. It's probably a fair change which just triggers the problem innocently.
I didn't rule out the possibility that another 3rd party lib is involved (e.g. openssl).

Resources

Ubuntu bug
Debian Bug
Serverfault [1]
Threads on openssh-unix-dev: [1] [2]
A report that correlates the problem to Cisco, a surprising progress!

17 thoughts on “The mysterious case of broken SSH client (“connection reset by peer”)”

Brian May 6, 2011 at 22:05

Thanks man...been pulling my hair out looking for a solution, this help immensely!

Deno May 13, 2011 at 14:32

works for me. I pasted the first line of ciphers from ssh_config man page, and the workaround still worked (arround)

slogin -c aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,

annoying bug, thx for the workaround.

James Benkart August 23, 2011 at 17:41

I get the problem, and Im running ubuntu 10.04 as a host with vmware server 2, the box I am trying to log into is a vm, ubuntu 11.04 server 64 bit, brand new install with bridged networking through a cheap Netgear router.

Andrew Schulman October 22, 2011 at 18:06

It's not a Cisco problem, at least in our LAN. We have this bug and there's no Cisco gear here.

David Tomaschik November 11, 2011 at 18:37

OpenSSH 5.7 was the first version to support ECDSA (Eliptic Curve Cryptography). (http://openssh.org/txt/release-5.7) The addition of the new cipher suites probably lengthened the cipher list enough to trigger the IPS rule.

Apparently some *other* SSH servers (not OpenSSH) had a buffer overflow vulnerability if the cipher list was sufficiently long, so the IPS is trying to protect against these.

Oren Held Post authorNovember 12, 2011 at 13:49

David,

I indeed suspected the long list triggered some IPS rule, but your explanation adds more sense and makes it more concrete. Thanks.

So now the question is whether IPS developers are aware of it and prepare a fix.

Tom Wilson January 15, 2012 at 21:27

Even after trying multiple workarounds I am still not able to connect to the server. I'm really not sure where the fault lies because they reworked the network at school over break, and I also rolled out a new development box at my house at the same time.

When connecting:
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.8p1 Debian-7ubuntu1
debug1: Local version string SSH-2.0-OpenSSH_5.8p1 Debian-7ubuntu1

debug1: SSH2_MSG_SERVICE_REQUEST sent
Read from socket failed: Connection reset by peer

I have this in my ssh_config:
HostKeyAlgorithms ssh-rsa,ssh-dss
MACs hmac-md5,hmac-sha1,hmac-ripemd160

Trying to connect using: ssh -c aes256-ctr -v dev.tridiumtech.com (or aes128-ctr; same issue)

Server sshd_config:
Ciphers aes128-ctr,aes192-ctr,aes256-ctr

Although my connections fail from Ubuntu, they work from putty on my WinXP VM without any issue. For that reason, I doubt that it is any intermediate equipment causing an issue. It seems to be all in the client/server software interaction.

Tom Wilson January 16, 2012 at 04:44

Actually, my problem may be unrelated. I just rolled multiple versions of OpenSSH (4.9p1, 5.6p1, 5.9p1) against multiple versions of OpenSSL (0.9.7m and 0.9.8s) and all result in the same error.

putty works fine from Windows and Linux, but unfortunately without ssh in working order, GVFS and sshfs cannot be used which I rely on heavily. :-\

Dragan Sahpaski August 13, 2012 at 15:04

Thanks so much

Artyom Krilov November 22, 2012 at 11:24

Thanks for your explanation, now it's more clear.

Just I can confirm, that I had the same problem in my "environment", and specifying ciphers has resolved this issue.

Client:
- OpenSSH_5.9p1 Debian-5ubuntu1, OpenSSL 1.0.1 14 Mar 2012
Servers:
- OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007
- OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009

Pingback: AIX 7.1 ssh connection problem

Steve Brown April 8, 2013 at 14:18

Hi there,

Thanks for the writeup. I am not sure if I am facing the same problem or not. The symptoms are the same, and limiting the ciphers also fixed the problem. However, I suspected an MTU problem. With SSH, the packets cannot be fragmented. If your mtu is too high, and there is something blocking the ICMP packet requesting a lower MTU, the connection will hang because the reply is too big. This can be tested using
ping -M do -s 1500
If a reply comes through, this is probably not the issue. If a reply does not come through, repeatedly lower the -s value above until it does. If this number is lower than what your MTU is set to, this is probably the culprit.

William October 17, 2013 at 17:32

Same behavior here. The interesting thing is that none of the *-cbc ciphers will work. All other ciphers succeeded no matter how long the list was.

Pingback: vpnc ssh problem | The (Code) Den

Jason Wenger March 9, 2015 at 18:40

+1 to Steve Brown on MTU.

Relevant link

Same issue, connection reset while expecting SSH2_MSG_KEX_DH_GEX_GROUP.
Changing ciphers, etc, did not change anything for me.
Changing interface MTU to 576 corrected hacked around the problem.

Dave T May 25, 2016 at 02:10

With the same symptom (ssh -v hangs, then fails, after displaying 'expecting SSH2_MSG_KEX_DH_GEX_GROUP') we found that
ping -c 2 -s 1419
caused something in the network path to temporarily accept larger packets, so that ssh then worked for a while.
PING some.remote.host (NNN.NNN.NNN.NNN) 1419(1447) bytes of data.
From NNN.NNN.III.JJJ icmp_seq=1 Frag needed and DF set (mtu = 1438)
1427 bytes from NNN.NNN.III.JJJ: icmp_seq=2 ttl=57 time=241 ms

This is on a complex world-wide corporate network. I don't know how generally relevant it might be.

Michael June 21, 2018 at 13:57

+1 to Steve Brown on MTU.
Oren, thanks for this post and thanks to all the contributors.

We've experienced similar problems and solved it by changing the negotiating policy regarding packet sizing.

Symptom:
We use a VPN to an external service provider. Inside that VPN we build up a SSH tunnel. We're using it to tunnel HTTP traffic. Building up that tunnel is done by explicitly choosing a specific cypher. If we don't restrict the cypher list the tunnel is not built up correctly (in fact producing these "hangs"). We discovered this thanks to the comments on this post.
So we always had a correct tunnel but unfortunately our network department changed "somthing" that broke this stability.
Upto a (rather small) size of data chunks everything was fine. Off that size we experienced the documented "hangs".
Solution:
We drilled down the network layers and identified the MSS (maximum segment size) which is a kind of synonym to MTU (maximum transfer unit) as part of the problem. When a communication channel (like our SSH tunnel) within our VPN is built up the VPN partners negotiate a MSS.
Disabling this negotiation process by settting a fixed size ("I can only handle 1k") solved the problem.

Web 0.2

Linux, FOSS, Web and more: a buzzword-free blog

The mysterious case of broken SSH client (“connection reset by peer”)

17 thoughts on “The mysterious case of broken SSH client (“connection reset by peer”)”

Leave a Reply Cancel reply