Update: from the info I've gathered, this is most probably a problem with some Cisco IDS/DPI is running on the ethernet equipment. Workaround is available in the content below, I still don't know what's the real solution here (Cisco equipment config? update Cisco firmware?)
Starting with 5.7p1, ssh client on specific environments fails connecting to specific (usually old versioned) servers. I reproduced it on a particular network, while trying to connect using new ssh client (5.8p1, Ubuntu 11.04) to an old server (default SSH server on RedHat 5.4).
This issue is around for quite a while, but is very tricky to reproduce or understand. What bothered me most is that many people reported it to different forums, each posting only a few (different) pieces of the puzzle. So my motivation here is to try and summarize the relevant info from multiple places. I'll do my best to update this post when I hear something new.
Complete Fact list
- Problem is present on 5.7p1, 5.8p1.
- Exact same version (e.g. 5.7p1) works on some environments, and fails on others. (My definition to "environment": particular client machine, particular server machine, on a particular network)
- On the "bad" environments, the problem is 100% reproducible. SSH dies immediately right after connection with the "connection reset by peer" message. Running ssh -vvv don't shed too much light on this problem (see here).
- Workarounds: On the "bad" environments, the two following workarounds are known to always work around the problem:
- Shortening the cipher list ('ssh -c aes256-ctr').
- Shortening the HostKeyAlgorithms list
- On the "bad" environments, enlarging the cipher list manually using '-c aes256-ctr,,,,,,' with enough commas, triggers the problem. It's easy to find a deterministic (per-environment) threshold for the length of the cipher list. More than this threshold - breaks ssh client, less - works perfectly.
- On "bad" environments, downgrading to an older release (5.5p1) resolves the problem.
- I (among many others) reported the problem to openssh-unix-dev mailing list, but the openssh developers couldn't reproduce the problem on their place, and therefore couldn't yet investigate it properly.
- It occurs on networks with Cisco equipment, possibly some "smart" Deep Packet Inspection filter ruins specific packets.
- It has to do with the packet size of one of the "handshaking" packets. Setting a short cipher list/HostKeyAlgorithms list simply shortens the packet size below some threshold.
- The buggy behavior HAS something to do with some change in OpenSSH, probably starting in 5.7p1. It's probably a fair change which just triggers the problem innocently.
- I didn't rule out the possibility that another 3rd party lib is involved (e.g. openssl).
- Ubuntu bug
- Debian Bug
- Serverfault 
- Threads on openssh-unix-dev:  
- A report that correlates the problem to Cisco, a surprising progress!
Thanks man...been pulling my hair out looking for a solution, this help immensely!
works for me. I pasted the first line of ciphers from ssh_config man page, and the workaround still worked (arround)
slogin -c aes128-ctr,aes192-ctr,aes256-ctr,arcfour256,arcfour128,
annoying bug, thx for the workaround.
I get the problem, and Im running ubuntu 10.04 as a host with vmware server 2, the box I am trying to log into is a vm, ubuntu 11.04 server 64 bit, brand new install with bridged networking through a cheap Netgear router.
It's not a Cisco problem, at least in our LAN. We have this bug and there's no Cisco gear here.
OpenSSH 5.7 was the first version to support ECDSA (Eliptic Curve Cryptography). (http://openssh.org/txt/release-5.7) The addition of the new cipher suites probably lengthened the cipher list enough to trigger the IPS rule.
Apparently some *other* SSH servers (not OpenSSH) had a buffer overflow vulnerability if the cipher list was sufficiently long, so the IPS is trying to protect against these.
I indeed suspected the long list triggered some IPS rule, but your explanation adds more sense and makes it more concrete. Thanks.
So now the question is whether IPS developers are aware of it and prepare a fix.
Even after trying multiple workarounds I am still not able to connect to the server. I'm really not sure where the fault lies because they reworked the network at school over break, and I also rolled out a new development box at my house at the same time.
debug1: Remote protocol version 2.0, remote software version OpenSSH_5.8p1 Debian-7ubuntu1
debug1: Local version string SSH-2.0-OpenSSH_5.8p1 Debian-7ubuntu1
debug1: SSH2_MSG_SERVICE_REQUEST sent
Read from socket failed: Connection reset by peer
I have this in my ssh_config:
Trying to connect using: ssh -c aes256-ctr -v dev.tridiumtech.com (or aes128-ctr; same issue)
Although my connections fail from Ubuntu, they work from putty on my WinXP VM without any issue. For that reason, I doubt that it is any intermediate equipment causing an issue. It seems to be all in the client/server software interaction.
Actually, my problem may be unrelated. I just rolled multiple versions of OpenSSH (4.9p1, 5.6p1, 5.9p1) against multiple versions of OpenSSL (0.9.7m and 0.9.8s) and all result in the same error.
putty works fine from Windows and Linux, but unfortunately without ssh in working order, GVFS and sshfs cannot be used which I rely on heavily. :-\
Thanks so much
Thanks for your explanation, now it's more clear.
Just I can confirm, that I had the same problem in my "environment", and specifying ciphers has resolved this issue.
- OpenSSH_5.9p1 Debian-5ubuntu1, OpenSSL 1.0.1 14 Mar 2012
- OpenSSH_4.7p1 Debian-8ubuntu1.2, OpenSSL 0.9.8g 19 Oct 2007
- OpenSSH_5.3p1 Debian-3ubuntu7, OpenSSL 0.9.8k 25 Mar 2009
Pingback: AIX 7.1 ssh connection problem
Thanks for the writeup. I am not sure if I am facing the same problem or not. The symptoms are the same, and limiting the ciphers also fixed the problem. However, I suspected an MTU problem. With SSH, the packets cannot be fragmented. If your mtu is too high, and there is something blocking the ICMP packet requesting a lower MTU, the connection will hang because the reply is too big. This can be tested using
ping -M do -s 1500
If a reply comes through, this is probably not the issue. If a reply does not come through, repeatedly lower the -s value above until it does. If this number is lower than what your MTU is set to, this is probably the culprit.
Same behavior here. The interesting thing is that none of the *-cbc ciphers will work. All other ciphers succeeded no matter how long the list was.
Pingback: vpnc ssh problem | The (Code) Den
+1 to Steve Brown on MTU.
Same issue, connection reset while expecting SSH2_MSG_KEX_DH_GEX_GROUP.
Changing ciphers, etc, did not change anything for me.
Changing interface MTU to 576 corrected hacked around the problem.
With the same symptom (ssh -v hangs, then fails, after displaying 'expecting SSH2_MSG_KEX_DH_GEX_GROUP') we found that
ping -c 2 -s 1419
caused something in the network path to temporarily accept larger packets, so that ssh then worked for a while.
PING some.remote.host (NNN.NNN.NNN.NNN) 1419(1447) bytes of data.
From NNN.NNN.III.JJJ icmp_seq=1 Frag needed and DF set (mtu = 1438)
1427 bytes from NNN.NNN.III.JJJ: icmp_seq=2 ttl=57 time=241 ms
This is on a complex world-wide corporate network. I don't know how generally relevant it might be.
+1 to Steve Brown on MTU.
Oren, thanks for this post and thanks to all the contributors.
We've experienced similar problems and solved it by changing the negotiating policy regarding packet sizing.
We use a VPN to an external service provider. Inside that VPN we build up a SSH tunnel. We're using it to tunnel HTTP traffic. Building up that tunnel is done by explicitly choosing a specific cypher. If we don't restrict the cypher list the tunnel is not built up correctly (in fact producing these "hangs"). We discovered this thanks to the comments on this post.
So we always had a correct tunnel but unfortunately our network department changed "somthing" that broke this stability.
Upto a (rather small) size of data chunks everything was fine. Off that size we experienced the documented "hangs".
We drilled down the network layers and identified the MSS (maximum segment size) which is a kind of synonym to MTU (maximum transfer unit) as part of the problem. When a communication channel (like our SSH tunnel) within our VPN is built up the VPN partners negotiate a MSS.
Disabling this negotiation process by settting a fixed size ("I can only handle 1k") solved the problem.