Abstract: SCP and the underlying SSH protocol is network performance limited by statically defined internal flow control buffers. These buffers often end up acting as a brake on the network throughput of SCP especially on long and wide paths. Modifying the ssh code to allow the flow control buffers to be defined at run time eliminates this bottleneck.
High Bandwidth and High Latency links are becoming more prevalent in corporate and academic institutions. Applications that use windowing thus need to ensure that the window size is at least equal to the Bandwidth Delay Product, or BDP, are to obtain maximum utilization of the link. The BDP is the product of the narrowest portion of the network path and the round trip delay time and represents the total data carrying capactity of the path. For TCP it is already possible to tune the tcp window size manually or use an autotuning mechanism, such as the Web100 linux kernel patch to ensure maximum throughput with TCP. However, when applications above the TCP layer implement windowing, the limitation on throughput then becomes the less of either TCP or the application. In OpenSSH the limitation appears in the static window sizes that appear in channels.h as defined values.
Modifying the static size to be a larger value would only serve to waste space in the event that it is larger than the underlying protocol's window size. Asking the user to specify the size also presents the problem of requiring users to be knowledgable in network performance tuning. Adjusting the size of window to be large enough so that it is no longer the limitation on throughput, but not much larger than it needs to be in order to obtain the desired performance would be the ideal solution.
There were only two changes needed to adjust the SSH window based on the TCP window. One was to enable the buffer code to allocate larger sizes. This was done using a variable that replaced the constant that was the maximum size allowed by the buffer code, and a function to modify the variable's default value to something larger. The second change was to get the TCP window size from getsockopt and adjust the window size to match, but only if the new size was larger than the old one. The returned value from getsockopt is also doubled because OpenSSH only sends a WINDOW_ADJUST message when the window is half full in order to save on the number of WINDOW_ADJUST messages sent with a cost of doubling the buffer size.
The following hosts were used in the performance tests. kirana was running
a 2.6 linux kernel with the Web100 patch. tg-login was runing a 2.6 kernel
without autotuning, but a tcp window size of 10,000,000 bytes. The link
BDP of a 1Gbps with a 0.04 second delay is 40,000,000 bits or 5,000,000
bytes. The 300MB file was copied from /dev/shm on one machine to /dev/null
on the other.
Hosts:
kirana.psc.edu
Dual PIII 1.0Ghz (Coppermine)
1Gig RAM
GigaBit Ethernet
| 1 | bar-kirana-ge-0-2-0-0.psc.net | (192.88.115.169) | 0.292 ms | 9.452 ms | 0.204 ms |
| 2 | beast-bar-g4-0-1.psc.net | (192.88.115.18) | 0.129 ms | 0.099 ms | 0.094 ms |
| 3 | abilene-psc.abilene.ucaid.edu | (192.88.115.124) | 9.801 ms | 9.792 ms | 9.805 ms |
| 4 | nycmng-washng.abilene.ucaid.edu | (198.32.8.84) | 14.042 ms | 14.036 ms | 14.138 ms |
| 5 | chinng-nycmng.abilene.ucaid.edu | (198.32.8.82) | 34.341 ms | 41.711 ms | 34.326 ms |
| 6 | mren-chin-ge.abilene.ucaid.edu | (198.32.11.98) | 34.421 ms | 34.466 ms | 34.417 ms |
| 7 | sbr0-lsd6509.gw.ncsa.edu | (198.17.196.1) | 36.957 ms | 36.949 ms | 36.920ms |
| 8 | acb-2-vlan101.gw.ncsa.edu | (141.142.0.6) | 37.010 ms | 36.957 ms | 36.943ms |
| 9 | core-10-acb-2.gw.ncsa.edu | (141.142.0.133) | 37.091 ms | 36.965 ms | 36.958 ms |
| 10 | hg-core-core-10.gw.ncsa.edu | (141.142.0.138) | 38.300 ms | 38.866 ms | 38.312 ms |
| 11 | hg-1-hg-core.ncsa.teragrid.org | (141.142.47.34) | 38.739 ms | 39.187 ms | 38.340 ms |
| 12 | tg-login1.ncsa.teragrid.org | (141.142.48.5) | 36.996 ms | 36.959 ms | 36.950 ms |

| 3des-cbc | 1.3MB/s |
| arcfour | 1.9MB/s |
| aes192-cbc | 1.8MB/s |
| aes256-cbc | 1.8MB/s |
| aes128-ctr | 1.9MB/s |
| aes192-ctr | 1.8MB/s |
| aes256-ctr | 1.8MB/s |
| blowfish-cbc | 1.9MB/s |
| cast128-cbc | 1.7MB/s |
| rijndael-cbc@lysator.liu.se | 1.8MB/s |

| 3des-cbc | 2.8MB/s |
| arcfour | 24.4MB/s |
| aes192-cbc | 13.3MB/s |
| aes256-cbc | 11.7MB/s |
| aes128-ctr | 12.7MB/s |
| aes192-ctr | 11.7MB/s |
| aes256-ctr | 11.3MB/s |
| blowfish-cbc | 16.3MB/s |
| cast128-cbc | 7.9MB/s |
| rijndael-cbc@lysator.liu.se | 12.2MB/s |
The tests showed that throughput was increased dramaticly, and the limitation was no longer the TCP or SSH window size, but the ability of the host to encrypt at a rate fast enough to send out over the Gigabit Ethernet. This is clearly demonstrated by the vast performance difference between 3des-cbc, the slowest cipher, and arcfour, the fastest cipher.
There are no implications that we know of.
July 13, 2004: sshd input buffer fix
Our code uncovered a bug in the manner in which the input buffer in sshd grew. It was possible to make the input buffer grow larger than a set maximum bound leading to a fatal exception for that sshd process. We've addressed this by explicitly checking the size of the buffer before allowing it to grow.
July 12, 2004: Window Size fix
1) Based on coversations on the openssh developers mailing list we've imposed a
maximum size on the network buffer of 2^30-1 bytes. Because of how the ssh code
uses buffers this is an effective limit of 2^29-1 bytes or 512 Megabytes. This
should be sufficient for all but the longest and fattest network paths. It might
be possible to increase this by 1 bit in the future but this should be sufficient
for now.
2) The CVS and Portable patches have been rolled into one patch. It might throw
a warning against the Portable version but it shouldn't cause any problems.
July 7, 2004: Initial released1
First release of openssh-3.8.1p1-dynwindow patch v. 0.1