Порядок создания СЗПДн. Требования к сетевым устройствам

Название	Требования к сетевым устройствам
Анкор	Порядок создания СЗПДн
Дата	11.10.2021
Размер	1.38 Mb.
Формат файла
Имя файла	rfc1122_HOST1_tr.doc
Тип	Протокол #245378
страница	11 из 11

1 2 3 4 5 6 7 8 9 10 11

§4.2.2.2. Использование флага “PUSH”(“продвижение”): RFC-793

Когда прикладной процесс транслирует последовательность вызовов “Передача” (“SEND”) без установки флага “PUSH” в значение “1”, ТСР-модуль может собрать все данные внутри себя без последующей их передачи. Аналогично, когда принята последовательность ТСР-блоков, в которых не установлен флаг “PUSH” в значение “1”, ТСР-модуль может образовать внутри себя очередь из поступивших ТСР-блоков без последующей их доставки принимающему прикладному процессу.

Бит “PSH” не является маркером записи и не зависит от границ ТСР-блока. Передающей стороне целесообразно обнулять все последующие биты “PSH”, когда она формирует пакеты данных, с целью передачи блока максимально возможной длины.

ТСР-модуль может применить флаги “PUSH” при поступлении вызовов “Передача” (“SEND”). Если же флаги “PUSH” не применяются, то передающий ТСР-модуль должен: не накапливать бесконечно данные в буферной памяти, устанавливать бит “PSH” в значение “1” в последнем блоке, записанном в буферной памяти (то есть, когда больше нет данных для передачи).

В стандарте RFC-793 установлено, что принятый флаг “PSH” (значение “1”) должен обязательно транслироваться на прикладной уровень. Однако, данный стандарт отменяет это требование: доставка принятого флага “PSH” (значение “1”) на прикладной уровень является не обязательной функцией (дополнительной).

С логической точки зрения, для установки флага “PUSH” при вызове “Передача” запрашивается прикладная программа, которая необходима всякий раз, когда нужно ускорить доставку данных с целью предотвращения блокировки соединения. Тем не менее, целесообразно, чтобы ТСР-модуль транслировал ТСР-блок максимальной, насколько это возможно, длины с целью повышения пропускной способности.

Когда установка флага “PUSH” в значение “1” на применяется при поступлении последовательности вызовов “Передача” (“SEND”), то есть когда транспортный интерфейс (интерфейс между транспортным и прикладным уровнями Internet-архитектуры) использует простую модель потоковой передачи данных, ответственность за сборку даже самых коротких фрагментов данных с целью формирования ТСР-блоков приемлемой длины частично ложиться и на прикладной уровень.

Как правило, прикладной интерактивный протокол должен устанавливать флаг “PUSH” в значение “1”, по крайней мере, в последнем вызове “Передача” в каждой последовательности команд или ответов. Целесообразно, чтобы протокол массовой доставки, подобный FTP-протоколу, устанавливал флаг “PUSH” в значение “1” в последнем фрагменте файла или когда необходимо предотвратить блокировку буферной памяти.

На приёмной стороне, бит “PSH” (“1”) “подталкивает” данные, накапливаемые в буферной памяти, с целью их доставки на прикладной уровень (даже если таких данных меньше, чем может вместить приёмный буфер). И наоборот, отсутствие бита “PSH” (“1”) может быть использовано для предотвращения ненужных сигналов тревоги, транслируемых прикладному процессу. А это, в свою очередь, может быть важной оптимизационной характеристикой для высокопроизводительных ГВМ, работающих в режиме множественного доступа с временным разделением каналов. Доставка бита “PSH” (“1”) на принимающий прикладной модуль позволяет провести аналогичную оптимизацию в рамках прикладной системы.
§4.2.2.3. Размер окна: RFC-793

Размер окна (“window size”) должно трактовать только как целое беззнаковое число, или иначе слишком большие размеры окна будут рассматриваться как отрицательные окна и ТСР-модуль не будет работать. Рекомендуется, чтобы все реальные системы резервировали 32-битовое поле в записи о параметрах соединения для передачи и приёма размеров окон, а также производили расчёты размеров окон в пределах 32 битов.

Известно, что поле “Размер длины (в октетах) “скользящего окна” в заголовке ТСР-блока слишком мало для высокоскоростных ретрансляционных участков и участков с большой задержкой во времени. Для решения этой проблемы и увеличения размера окна были определены новые (экспериментальные) дополнительные ТСР-функции (RFC-1323 и RFC-2018). Рекомендуется, чтобы разработчики программных ТСР-модулей встраивали в свои программные продукты эти функции.
§4.2.2.4. Указатель срочности “Urgent”: RFC-793

ТСР-протокол обязан обрабатывать последовательность срочных данных любой длины.

ТСР-модуль обязан в любой момент информировать (асинхронно) прикладной уровень о том, что он принял указатель срочности “Urgent” и ранее не было задержки срочных данных, или, что в потоке данных содержится указатель срочности “Urgent”. Прикладной модуль должен быть способен анализировать, как много срочных данных осталось для прочтения, которые поступают через соединение, или, по крайней мере, определять, есть ли ещё срочные данные, подлежащие прочтению, или же их нет.

Несмотря на то, что способ передачи указатель срочности “Urgent” может быть реализован в любой прикладной системе. Этот указатель обычно используется при передаче команды типа “interrupt” (сигнал прерывания) для программного модуля Telnet-протокола (RFC-1123).

Асинхронное или “внеполосное” извещение позволит прикладному процессу (системе), который “читает” данные, поступающие по ТСР-соединению, перейти в “режим срочного извещения” (“urgent mode”). Это обеспечивает передачу команд управления прикладному процессу (системе), у которого, как правило, вся буферная память заполнена необработанными данными.

Возможным способом оповещения прикладного процесса (системы) о получении срочных данных является передача общего сигнала оповещения “ERROR-REPORT()”.
§4.2.2.5. Дополнительные ТСР-функции: RFC-793

ТСР-модуль обязан принимать дополнительную ТСР-функцию в любом ТСР-блоке. Кроме этого, ТСР-модуль обязан игнорировать дополнительную ТСР-функцию в любом принятом без ошибок ТСР-блоке, если такая функция в нём не реализована при условии, что в заголовке дополнительной ТСР-функции имеет место поле “Длина дополнительной ТСР-функции” (в дальнейшем все заголовки дополнительных ТСР-функций будут иметь поле “Длина дополнительной ТСР-функции”). ТСР-модуль должен быть настроен так, чтобы управлять длиной поля не приемлемой дополнительной ТСР-функции (например, устанавливать нулевую длину) без разрыва виртуального соединения. В этом случае предлагается переустановить соединение и зафиксировать причину этой процедуры.
§4.2.2.6. Дополнительная ТСР-функция “Максимальная

длина ТСР-блока” (MSS-параметр): RFC-793

ТСР-модуль должен выполнять функции передачи и приёма дополнительной ТСР-функции “Максимальная длина ТСР-блока” (Maximum Segment Size — MSS; RFC-879).

Целесообразно, чтобы ТСР-модуль транслировал MSS-параметр в каждом ТСР-блоке с битом “SYN”, когда оп принял значение MSS-параметра отличное от значения 536 (устанавливается по умолчанию). Более того, ТСР-модуль может всегда передавать значение этого параметра.

Если значение MSS-параметра не было получено в фазе установления виртуального соединения, то ТСР-модуль обязан использовать значение этого параметра по умолчанию, то есть равное 536 (RFC-879).

Максимальная длина ТСР-блока, который реально транслируется ТСР-модулем, “эффективное значение передаваемого MSS-параметра” (“effective send MSS”), должна быть меньше значения передаваемого MSS-параметра (которое отражает допустимый размер буферной памяти для повторной сборки блоков на удалённой ГВМ) и больше длины, разрешённой IP-модулем (IP-уровнем):

Eff.snd.MSS = min(SendMSS+20, MMS_S) — TCPhdrsize — IPoptionsize ,

где:

“SendMSS” — значение MSS-параметра, полученное от удалённой ГВМ, или по умолчанию 536, если, конечно, значение этого параметра не было получено;
“MMS_S” — максимальная длина блока транспортного уровня, который может передать ТСР-модуль;
“TCPhdrsize” — длина ТСР-заголовка. Обычно она равна 20 байтов, но может быть больше, если дополнительные ТСР-функции подлежат передаче;
“IPoptionsize” — длина поля для любых дополнительных IP-функций, которые будет транслировать ТСР-модуль на сетевой уровень вместе с текущим ТСР-блоком (то есть, сообщением транспортного уровня)

Значение MSS-параметра, подлежащее передаче в поле “Максимальная длина ТСР-блока”, должно быть меньше или равно:

MMS_R – 20 ,

где “MMS_R” — максимальная длина ТСР-блока, который может быть принят (и повторно собран). Значения параметров “MMS_R” и “MMS_S” ТСР-модуль получает от IP-модуля (сетевого уровня).

Выбор длины ТСР-блока в значительной степени влияет на повышение пропускной способности. Длинные транспортные блоки увеличивают пропускную способность за счёт относительного уменьшения длины заголовка и снижения затрат на обработку каждого IP-пакета (относительно поля полезной нагрузки). Однако, если IP-пакет слишком большой, что влечёт за собой его фрагментацию, то при потере любого фрагмента эффективность резко падает.

Некоторые реализованные ТСР-модули транслируют только значение MSS-параметра, если ГВМ-получатель расположена в не присоединённой сети. Тем не менее, ТСР-модуль вообще может не иметь соответствующей информации для принятия такого решения, и поэтому решение задачи определения минимального MTU-значения для конкретного маршрута предпочтительнее оставить IP-модулю. Более того, рекомендуется, чтобы ТСР-модуль всегда транслировал значение MSS-параметра (если последнее не составляет 536), и чтобы IP-модуль определял значение “MMS_R”. Специализированное программное средство в составе IP-модуля для определения MTU-значения должно быть модифицировано без изменения ТСР-модуля.
§4.2.2.7. Проверочная (контрольная) сумма в ТСР-блоке: RFC-793

В отличие от проверочной суммы в UDP-блоке, проверочная сумма в ТСР-блоке всегда является обязательной, а не дополнительной. Отправитель обязан вычислять проверочную сумму, а получатель — её проверять.
§4.2.2.8. Диаграмма состояний ТСР-соединения: RFC-793

На рис.9 представлена блок-схема процедурной характеристики ТСР-протокола (RFC-793). Однако в этой блок-схеме существует несколько проблем:

Состояние:

«Закрыто»

Состояние:

«Прослушивание»

Состояние:

«Передача «SYN»

Состояние:

«Приём «SYN»

Состояние:

«Передача данных»

+---------+ ---------\ active OPEN

| CLOSED | \ -----------

+---------+<---------\ \ create TCB

| ^ \ \ snd SYN

passive OPEN | | CLOSE \ \

------------ | | ---------- \ \

create TCB | | delete TCB \ \

V | \ \

+---------+ CLOSE | \

| LISTEN | ---------- | |

+---------+ delete TCB | |

rcv SYN | | SEND | |

----------- | | ------- | V

+---------+ snd SYN,ACK / \ snd SYN +---------+

| |<----------------- ------------------>| |

| SYN | rcv SYN | SYN |

| RCVD |<-----------------------------------------------| SENT |

| | snd ACK | |

| |------------------ -------------------| |

+---------+ rcv ACK of SYN \ / rcv SYN,ACK +---------+

| -------------- | | -----------

| x | | snd ACK

| V V

| CLOSE +---------+

| ------- | ESTAB |

| snd FIN +---------+

| CLOSE | | rcv FIN

V ------- | | -------

+---------+ snd FIN / \ snd ACK +---------+

| FIN |<----------------- ------------------>| CLOSE |

| WAIT-1 |------------------ | WAIT |

+---------+ rcv FIN \ +---------+

| rcv ACK of FIN ------- | CLOSE |

| -------------- snd ACK | ------- |

V x V snd FIN V

+---------+ +---------+ +---------+

|FINWAIT-2| | CLOSING | | LAST-ACK|

+---------+ +---------+ +---------+

| rcv ACK of FIN | rcv ACK of FIN |

| rcv FIN -------------- | Timeout=2MSL -------------- |

| ------- x V ------------ x V

\ snd ACK +---------+delete TCB +---------+

------------------------>|TIME WAIT|------------------>| CLOSED |

+---------+ +---------+
Рис.9. Диаграмма состояний соединения ТСР-протокола

Стрелка от состояния “Передача “SYN” к состоянию “Приём “SYN” целесообразно обозначить “sndSYN,ACK”;
Возможно наличие стрелки от состояния “Приём “SYN” в состояние “Прослушивание”, обусловленное получением запроса на повторное установление соединения “RST” после пассивного открытия (“passive OPEN”);
Возможен прямой переход из состояния “FIN-WAIT-1” в состояние “TIME-WAIT”.

§4.2.2.9. Выбор начального последовательного номера: RFC-793

ТСР-модуль должен использовать специализированное основанное на системных часах программное средство для выбора начальных последовательных номеров.
§4.2.2.10. Одновременные попытки открытия ТСР-соединения: RFC-793

ТСР-модуль должен обеспечивать одновременные попытки открытия ТСР-соединения.

Некоторые разработчики иногда удивляются по поводу того, что если два прикладных процесса пытаются одновременно соединиться друг с другом, то в результате устанавливается только одно виртуальное соединение, а не два. Это было умышленное проектное решение (не старайтесь решить эту “проблему”).
§4.2.2.11. Восстановление на основе дубликата ранее принятого

ТСР-блока с установленным битом “SYN”: RFC-793

Следует заметить, что реализованный ТСР-модуль должен хранить запись о том, что перешло ли ТСР-соединения в состояние “SYN_RCVD” (приём ТСР-блока с установленным битом “SYN”) в результате пассивного открытия (“passive OPEN”) или активного открытия (“active OPEN”).
§4.2.2.12. ТСР-блок с установленным битом “RST”: RFC-793

Целесообразно, чтобы ТСР-модуль предусматривал включение данных в ТСР-блок с установленным битом “RST”

Рекомендуется, чтобы ТСР-блок с установленным битом “RST” мог бы содержать текст в ASCII-коде, который бы “разъяснял” причину нештатной ситуации, повлекшей за собой повторное установление ТСР-соединения. Тем не менее, в настоящее время таких правил кодирования нет.
§4.2.2.13. Фаза закрытия ТСР-соединения (“Closing”): RFC-793

ТСР-соединение может завершаться двумя способами:

обычное разъединение на основе обмена ТСР-блоками с установленными битами “FIN” (так называемое “рукопожатие” — “FIN handshake”);
разрыв ТСР-соединения на основе передачи одного или нескольких ТСР-блоков с установленными битами “RST”, после приёма которых ТСР-соединение разрывается.

Если ТСР-соединение закрывается удалённым процессом, то взаимодействующий с ним локальный прикладной процесс должен быть проинформирован о том, как произошло завершение ТСР-соединения, либо обычным способом, либо на основе разрыва.

В результате нормального завершения ТСР-соединения в обоих направлениях надёжно доставляются данные, которые сохраняются в буферной памяти обоих процессов. Так как оба направления ТСР-соединения завершаются независимо друг от друга, то возможно состояние “полузакрытого” ТСР-соединения, то есть когда ТСР-соединение завершается только в одном направлении, а ГВМ разрешено продолжать передачу данных в открытом направлении полузакрытого ТСР-соединения.

В ПО ГВМ может быть реализован переход в состояние “полузакрытого” ТСР-соединения, причём таким образом, чтобы прикладной процесс, который инициировал фазу разъединения, не мог продолжить чтение данных поступающих от взаимодействующего удалённого процесса. Если такая ГВМ (её прикладной процесс) инициировала фазу разъединения, несмотря на то, что принятые данные всё ещё ожидают обработки ТСР-модулем, или если были приняты новые данные после начала фазы разъединения, то целесообразно, чтобы ТСР-модуль передал ТСР-блок с установленным битом “RST” с целью демонстрации того, что данные были потеряны.

Когда фаза разъединения инициализируется одним из прикладных процессов, то ТСР-соединение должно оставаться в состоянии ожидания “TIME-WAIT” в течение интервала времени, равного 2×MSL (Maximum Segment Lifetime — максимальное время жизни ТСР-блока). Тем не менее, возможен приём нового ТСР-блока с установленным битом “SYN” от удалённого ТСР-модуля с целью повторного установления ТСР-соединения непосредственно из состояния ожидания “TIME-WAIT”. Если этот ТСР-блок:

устанавливает свой начальный последовательный номер для нового ТСР-соединения, который больше, чем самый последний последовательный номер, используемый для идентификации предшествующего ТСР-соединения, то начинается фаза установления нового соединения;
оказался дубликатом ранее принятого ТСР-блока, то происходит возврат в состояние ожидания “TIME-WAIT”.

Фаза закрытия полнодуплексного ТСР-соединения с сохранением данных не представлена в аналогичном ISO-стандарте транспортного протокола ТР4.

Некоторые системы не внедрили состояние полузакрытых соединений и возможно потому, что они просто не “учли” это состояние в I/O-модели (вход/выход) своей используемой ОС. В таких системах, после перехода прикладного процесса в фазу разъединения, невозможно в течение длительного времени обрабатывать входные данные, поступающие через соединение от удалённого прикладного процесса. Поэтому такое состояние рассматривается как фаза закрытия полнодуплексного ТСР-соединения.

Поэтапный алгоритм завершения ТСР-соединения требует, чтобы состояние соединения оставалось ещё некоторое время, по крайней мере, на одной стороне соединения. Длительность такого тайм-аута составляет 2×MSL, то есть 4 минуты. В течение этого временного интервала пара идентификаторов (удалённый процесс и локальный процесс, IP-адрес/ТСР-порт — socket), определяющих ТСР-соединение, является занятой и не может повторно использоваться. Для сокращения этого времени с целью повторного использования этих идентификаторов, но для нового соединения, некоторые ТСР-модули допускают передачу нового ТСР-блока с установленным битом “SYN” в состоянии “TIME-WAIT”.
§4.2.2.14. Фаза передачи данных: RFC-793

После выхода стандарта RFC-793 была проведена громадная работа по улучшению ТСР-алгоритмов с целью повышения эффективности передачи данных (информационного обмена). В дальнейшем будут рассмотрены требуемые и рекомендованные ТСР-алгоритмы с целью определения момента начала передачи данных и соответствующих квитанций, а также информации для изменения размера “скользящего окна” или просто окна.

Существует одна очень важная проблема, именуемая “синдромом узкого окна” (“Silly Window Syndrome” — SWS, RFC-813), то есть стабильное и постепенное приращение узкого (малой ширины) окна, что является следствием крайне малой производительности ТСР-модуля. В дальнейшем будут рассмотрены алгоритмы предотвращения SWS-синдрома для приёмной и передающей сторон.

Если кратко, то SWS-синдром является следствием того, что приемная сторона сдвигает правую границу окна, причём всякий раз, когда на приёмной стороне появляется новое свободное пространство буферной памяти для входящих данных. Вместе с тем, SWS-синдром является также следствием того, что передающая сторона использует постепенно расширяющееся окно (сколь бы мало ни было это приращение) для передачи большего объёма данных (RFC-813). Результатом таких процессов может стать постоянная передача ТСР-блоков с очень маленьким объёмом полезной нагрузки (данных), даже несмотря на то, что и приёмная и передающая стороны имеют достаточно большое пространство буферной памяти для организации ТСР-соединения. SWS-синдром может проявляться только в период передачи большого объёма данных. Если ТСР-соединение находится под постоянной (равномерной) нагрузкой, то проблема исчезает сама по себе. Возникновение SWS-синдрома обусловлено функционированием типового программного модуля по управлению окном, но рассмотренные ниже алгоритмы, реализованные на передающей и приемной сторонах, предотвращают это негативное явление.

Другая важная проблемная характеристика ТСР-протокола заключается в том, что некоторые прикладные программные модули (процессы), особенно это касается удалённого доступа к ГВМ (серверам) посимвольной печати (character-at-a-time host), тяготеют к передаче потоков ТСР-блоков, содержащих всего лишь один октет данных. Для предотвращения блокировок каждый запрос к ТСР-модуль на передачу, поступающий от таких прикладных процессов, должен “проталкиваться” (“pushed”, принудительно продвигаться) либо прикладным процессом в явном виде, либо ТСР-модулем в неявном виде. Результатом таких действий может стать последовательность ТСР-блоков, содержащих по одному октету данных, что приведёт к очень неэффективному использованию самой Internet-сети и средств на её развитие. Алгоритм Найджела (John Nagle, RFC-896) является весьма простым и эффективным решением данной проблемы. Этот алгоритм обладает свойством накопления символов при взаимодействии прикладных программных Telnet-модулей. В начальной стадии этот эффект может стать “сюрпризом” для пользователей, которые “привыкли” к эхо-ответам с однобайтовым (односимвольным) полем данных, но эта привычка не была проблемой.

Следует заметить, что алгоритм Найджела и алгоритм предотвращения SWS-синдрома играют вспомогательные роли в процессе повышения производительности системы. Алгоритм Найджела воспрепятствует передаче “крошечных” ТСР-блоков, когда объем подлежащих передаче данных возрастает чрезвычайно медленно. Одновременно с этим, алгоритм предотвращения SWS-синдрома воспрепятствует появлению маленьких ТСР-блоков, формирование которых является результатом постепенного и медленного сдвига правой границы окна.

Встроенный некорректный ТСР-модуль может передавать две или более ответных квитанции (ТСР-блока) на каждый принятый информационный ТСР-блок. Например, предположим, что приёмная сторона сразу же отвечает передачей квитанции на каждый поступивший информационный ТСР-блок. Когда прикладной программный модуль (процесс) принимает данные и вновь увеличивает допустимый объем приёмного буфера памяти, то приёмная сторона может передать второй ответный ТСР-блок (вторую квитанцию) для корректировки окна на передающей стороне. Экстремальный случай относиться к передаче однобайтовых ТСР-блоков по ТСР-соединению с использованием TELNET-протокола в интересах удалённого доступа. Некоторые функционирующие в настоящее время ТСР-модули формируют в ответ на поступивший ТСР-блок с одним октетом данных соответственно три варианта ТСР-блока: (1) квитанцию, (2) ТСР-блок с указанием увеличения размера окна на один байт, (3) ТСР-блок с одним ответным символом.
§4.2.2.15. Тайм-аут повторной передачи: RFC-793

Алгоритм вычисления значения тайм-аута повторной передачи, который был предложен в стандарте RFC-793, не отвечает современным требованиям к эффективности передачи.

Для борьбы с перегрузками и обеспечения стабильности повторной передачи ТСР-протокола В.Якобсон (V.Jacobson) в 1988 году предложил использовать алгоритм передачи, который объединяет два алгоритма — “медленного старта” (“slow start”) и “предотвращения перегрузки” (“congestion avoidance”). В любом ТСР-модуле должен использоваться этот алгоритм.

Если повторно переданный IP-пакет идентичен IP-пакету, переданному первым (который which implies not only that the data boundaries have not changed, but also that the window and acknowledgment fields of the header have not changed), то может использоваться одно и то же значение в поле “Идентификатор” (см. §3.2.1.5).

Some TCP implementors have chosen to "packetize" the data stream, i.e., to pick segment boundaries when segments are originally sent and to queue these segments in a "retransmission queue" until they are acknowledged. Another design (which may be simpler) is to defer packetizing until each time data is transmitted or retransmitted, so there will be no segment retransmission queue.
In an implementation with a segment retransmission

queue, TCP performance may be enhanced by repacketizing

the segments awaiting acknowledgment when the first

retransmission timeout occurs. That is, the outstanding

segments that fitted would be combined into one maximum-sized

segment, with a new IP Identification value. The TCP would

then retain this combined segment
in the retransmit queue until it was acknowledged.

However, if the first two segments in the retransmission

queue totalled more than one maximum-sized segment,

the TCP would retransmit only the first segment

using the original IP Identification field.
4.2.2.16 Managing the Window: RFC-793 Section 3.7, page 41
A TCP receiver SHOULD NOT shrink the window, i.e., move the

right window edge to the left. However, a sending TCP MUST

be robust against window shrinking, which may cause the

"useable window" (see Section 4.2.3.4) to become negative.
If this happens, the sender SHOULD NOT send new data, but

SHOULD retransmit normally the old unacknowledged data

between SND.UNA and SND.UNA+SND.WND. The sender MAY also

retransmit old data beyond SND.UNA+SND.WND, but SHOULD NOT

time out the connection if data beyond the right window edge

is not acknowledged. If the window shrinks to zero, the TCP

MUST probe it in the standard way (see next Section).
DISCUSSION:

Many TCP implementations become confused if the window

shrinks from the right after data has been sent into a

larger window. Note that TCP has a heuristic to select the

latest window update despite possible datagram reordering;

as a result, it may ignore a window update with a smaller

window than previously offered if neither the sequence

number nor the acknowledgment number is increased.
4.2.2.17 Probing Zero Windows: RFC-793 Section 3.7, page 42
Probing of zero (offered) windows MUST be supported.
A TCP MAY keep its offered receive window closed

indefinitely. As long as the receiving TCP continues to

send acknowledgments in response to the probe segments, the

sending TCP MUST allow the connection to stay open.
DISCUSSION:

It is extremely important to remember that ACK

(acknowledgment) segments that contain no data are not

reliably transmitted by TCP. If zero window probing is

not supported, a connection may hang forever when an

ACK segment that re-opens the window is lost.
The delay in opening a zero window generally occurs

when the receiving application stops taking data from

its TCP. For example, consider a printer daemon

application, stopped because the printer ran out of paper.
The transmitting host SHOULD send the first zero-window

probe when a zero window has existed for the retransmission

timeout period (see Section 4.2.2.15), and SHOULD increase

exponentially the interval between successive probes.
DISCUSSION:

This procedure minimizes delay if the zero-window

condition is due to a lost ACK segment containing a

window-opening update. Exponential backoff is

recommended, possibly with some maximum interval not

specified here. This procedure is similar to that of

the retransmission algorithm, and it may be possible to

combine the two procedures in the implementation.
4.2.2.18 Passive OPEN Calls: RFC-793 Section 3.8
Every passive OPEN call either creates a new connection

record in LISTEN state, or it returns an error; it MUST NOT

affect any previously created connection record.
A TCP that supports multiple concurrent users MUST provide

an OPEN call that will functionally allow an application to

LISTEN on a port while a connection block with the same

local port is in SYN-SENT or SYN-RECEIVED state.
DISCUSSION:

Some applications (e.g., SMTP servers) may need to

handle multiple connection attempts at about the same

time. The probability of a connection attempt failing

is reduced by giving the application some means of listening

for a new connection at the same time that an earlier

connection attempt is going through the three-way handshake.
IMPLEMENTATION:

Acceptable implementations of concurrent opens may permit

multiple passive OPEN calls, or they may allow "cloning"

of LISTEN-state connections from a single passive OPEN call.
4.2.2.19 Time to Live: RFC-793 Section 3.9, page 52
RFC-793 specified that TCP was to request the IP layer to

send TCP segments with TTL = 60. This is obsolete; the TTL

value used to send TCP segments MUST be configurable. See

Section 3.2.1.7 for discussion.
4.2.2.20 Event Processing: RFC-793 Section 3.9
While it is not strictly required, a TCP SHOULD be capable of queueing

out-of-order TCP segments. Change the "may" in the last sentence

of the first paragraph on page 70 to "should".
DISCUSSION:

Some small-host implementations have omitted segment

queueing because of limited buffer space. This

omission may be expected to adversely affect TCP

throughput, since loss of a single segment causes all

later segments to appear to be "out of sequence".
In general, the processing of received segments MUST be

implemented to aggregate ACK segments whenever possible.

For example, if the TCP is processing a series of queued

segments, it MUST process them all before sending any ACK segments.
Here are some detailed error corrections and notes on the

Event Processing section of RFC-793.
(a) CLOSE Call, CLOSE-WAIT state, p. 61: enter LAST-ACK state, not CLOSING.
(b) LISTEN state, check for SYN (pp. 65, 66): With a SYN

bit, if the security/compartment or the precedence is

wrong for the segment, a reset is sent. The wrong form

of reset is shown in the text; it should be:

(c) SYN-SENT state, Check for SYN, p. 68: When the connection enters

ESTABLISHED state, the following variables must be set:

SND.WND <- SEG.WND

SND.WL1 <- SEG.SEQ

SND.WL2 <- SEG.ACK
(d) Check security and precedence, p. 71: The first heading

"ESTABLISHED STATE" should really be a list of all

states other than SYN-RECEIVED: ESTABLISHED, FIN-WAIT-

1, FIN-WAIT-2, CLOSE-WAIT, CLOSING, LAST-ACK, and TIME-WAIT.
(e) Check SYN bit, p. 71: "In SYN-RECEIVED state and if

the connection was initiated with a passive OPEN, then

return this connection to the LISTEN state and return. Otherwise...".
(f) Check ACK field, SYN-RECEIVED state, p. 72: When the connection

enters ESTABLISHED state, the variables listed in (c) must be set.
(g) Check ACK field, ESTABLISHED state, p. 72: The ACK is a

duplicate if SEG.ACK =< SND.UNA (the = was omitted). Similarly,

the window should be updated if: SND.UNA =< SEG.ACK =< SND.NXT.
(h) USER TIMEOUT, p. 77:
It would be better to notify the application of the

timeout rather than letting TCP force the connection

closed. However, see also Section 4.2.3.5.
4.2.2.21 Acknowledging Queued Segments: RFC-793 Section 3.9
A TCP MAY send an ACK segment acknowledging RCV.NXT when a valid segment

arrives that is in the window but not at the left window edge.
DISCUSSION:

RFC-793 (see page 74) was ambiguous about whether or not an ACK

segment should be sent when an out-of-order segment was

received, i.e., when SEG.SEQ was unequal to RCV.NXT.
One reason for ACKing out-of-order segments might be to

support an experimental algorithm known as "fast

retransmit". With this algorithm, the sender uses the

"redundant" ACK's to deduce that a segment has been

lost before the retransmission timer has expired. It

counts the number of times an ACK has been received

with the same value of SEG.ACK and with the same right

window edge. If more than a threshold number of such

ACK's is received, then the segment containing the

octets starting at SEG.ACK is assumed to have been lost

and is retransmitted, without awaiting a timeout. The

threshold is chosen to compensate for the maximum

likely segment reordering in the Internet. There is

not yet enough experience with the fast retransmit

algorithm to determine how useful it is.
4.2.3 SPECIFIC ISSUES
4.2.3.1 Retransmission Timeout Calculation
A host TCP MUST implement Karn's algorithm and Jacobson's

algorithm for computing the retransmission timeout ("RTO").
o Jacobson's algorithm for computing the smoothed round-

trip ("RTT") time incorporates a simple measure of the variance [TCP:7].

o Karn's algorithm for selecting RTT measurements ensures

that ambiguous round-trip times will not corrupt the

calculation of the smoothed round-trip time [TCP:6].
This implementation also MUST include "exponential backoff"

for successive RTO values for the same segment. Retransmission

of SYN segments SHOULD use the same algorithm as data segments.
DISCUSSION:

There were two known problems with the RTO calculations

specified in RFC-793. First, the accurate measurement

of RTTs is difficult when there are retransmissions.

Second, the algorithm to compute the smoothed round-

trip time is inadequate [TCP:7], because it incorrectly

assumed that the variance in RTT values would be small

and constant. These problems were solved by Karn's and

Jacobson's algorithm, respectively.
The performance increase resulting from the use of

these improvements varies from noticeable to dramatic.

Jacobson's algorithm for incorporating the measured RTT

variance is especially important on a low-speed link,

where the natural variation of packet sizes causes a

large variation in RTT. One vendor found link utilization

on a 9.6kb line went from 10% to 90% as a result of

implementing Jacobson's variance algorithm in TCP.
The following values SHOULD be used to initialize the

estimation parameters for a new connection:
(a) RTT = 0 seconds.

(b) RTO = 3 seconds. (The smoothed variance is to be

initialized to the value that will result in this RTO).
The recommended upper and lower bounds on the RTO are known

to be inadequate on large internets. The lower bound SHOULD

be measured in fractions of a second (to accommodate high speed

LANs) and the upper bound should be 2*MSL, i.e., 240 seconds.
DISCUSSION:

Experience has shown that these initialization values

are reasonable, and that in any case the Karn and

Jacobson algorithms make TCP behavior reasonably

insensitive to the initial parameter choices.
4.2.3.2 When to Send an ACK Segment
A host that is receiving a stream of TCP data segments can

increase efficiency in both the Internet and the hosts by

sending fewer than one ACK (acknowledgment) segment per data

segment received; this is known as a "delayed ACK" [TCP:5].
A TCP SHOULD implement a delayed ACK, but an ACK should not

be excessively delayed; in particular, the delay MUST be

less than 0.5 seconds, and in a stream of full-sized segments

there SHOULD be an ACK for at least every second segment.
DISCUSSION:

A delayed ACK gives the application an opportunity to

update the window and perhaps to send an immediate

response. In particular, in the case of character-mode

remote login, a delayed ACK can reduce the number of

segments sent by the server by a factor of 3 (ACK, window

update, and echo character all combined in one segment).
In addition, on some large multi-user hosts, a delayed ACK

can substantially reduce protocol processing overhead by

reducing the total number of packets to be processed [TCP:5].

However, excessive delays on ACK's can disturb the round-trip

timing and packet "clocking" algorithms [TCP:7].
4.2.3.3 When to Send a Window Update
A TCP MUST include a SWS avoidance algorithm in the receiver [TCP:5].
IMPLEMENTATION:

The receiver's SWS avoidance algorithm determines when

the right window edge may be advanced; this is

customarily known as "updating the window". This

algorithm combines with the delayed ACK algorithm (see

Section 4.2.3.2) to determine when an ACK segment

containing the current window will really be sent to

the receiver. We use the notation of RFC-793; see

Figures 4 and 5 in that document.
The solution to receiver SWS is to avoid advancing the right

window edge RCV.NXT+RCV.WND in small increments, even if

data is received from the network in small segments.
Suppose the total receive buffer space is RCV.BUFF. At

any given moment, RCV.USER octets of this total may be

tied up with data that has been received and acknowledged but

which the user process has not yet consumed. When the

connection is quiescent, RCV.WND = RCV.BUFF and RCV.USER = 0.
Keeping the right window edge fixed as data arrives and

is acknowledged requires that the receiver offer less

than its full buffer space, i.e., the receiver must

specify a RCV.WND that keeps RCV.NXT+RCV.WND constant

as RCV.NXT increases. Thus, the total buffer space

RCV.BUFF is generally divided into three parts:
|<------- RCV.BUFF ---------------->|

1 2 3

----|---------|------------------|------|----

RCV.NXT ^

(Fixed)
1 - RCV.USER = data received but not yet consumed;

2 - RCV.WND = space advertised to sender;

3 - Reduction = space available but not yet advertised.
The suggested SWS avoidance algorithm for the receiver

is to keep RCV.NXT+RCV.WND fixed until the reduction satisfies:
RCV.BUFF - RCV.USER - RCV.WND >=
min( Fr * RCV.BUFF, Eff.snd.MSS )
where Fr is a fraction whose recommended value is 1/2,

and Eff.snd.MSS is the effective send MSS for the

connection (see Section 4.2.2.6). When the inequality

is satisfied, RCV.WND is set to RCV.BUFF-RCV.USER.
Note that the general effect of this algorithm is to

advance RCV.WND in increments of Eff.snd.MSS (for

realistic receive buffers: Eff.snd.MSS < RCV.BUFF/2).

Note also that the receiver must use its own Eff.snd.MSS,

assuming it is the same as the sender's.
4.2.3.4 When to Send Data
A TCP MUST include a SWS avoidance algorithm in the sender.
A TCP SHOULD implement the Nagle Algorithm [TCP:9] to

coalesce short segments. However, there MUST be a way for

an application to disable the Nagle algorithm on an individual

connection. In all cases, sending data is also subject to the

limitation imposed by the Slow Start algorithm (Section 4.2.2.15).
DISCUSSION:

The Nagle algorithm is generally as follows:
If there is unacknowledged data (i.e., SND.NXT >

SND.UNA), then the sending TCP buffers all user

data (regardless of the PSH bit), until the

outstanding data has been acknowledged or until

the TCP can send a full-sized segment (Eff.snd.MSS

bytes; see Section 4.2.2.6).
Some applications (e.g., real-time display window updates)

require that the Nagle algorithm be turned off, so small

data segments can be streamed out at the maximum rate.
IMPLEMENTATION:

The sender's SWS avoidance algorithm is more difficult

than the receivers's, because the sender does not know

(directly) the receiver's total buffer space RCV.BUFF.

An approach which has been found to work well is for

the sender to calculate Max(SND.WND), the maximum send

window it has seen so far on the connection, and to use

this value as an estimate of RCV.BUFF. Unfortunately,

this can only be an estimate; the receiver may at any time

reduce the size of RCV.BUFF. To avoid a resulting deadlock,

it is necessary to have a timeout to force transmission

of data, overriding the SWS avoidance algorithm.

In practice, this timeout should seldom occur.
The "useable window" [TCP:5] is:
U = SND.UNA + SND.WND - SND.NXT
i.e., the offered window less the amount of data sent

but not acknowledged. If D is the amount of data

queued in the sending TCP but not yet sent, then the

following set of rules is recommended.
Send data:
(1) if a maximum-sized segment can be sent, i.e, if:
min(D,U) >= Eff.snd.MSS;
(2) or if the data is pushed and all queued data can

be sent now, i.e., if:
[SND.NXT = SND.UNA and] PUSHED and D <= U
(the bracketed condition is imposed by the Nagle algorithm);
(3) or if at least a fraction Fs of the maximum window

can be sent, i.e., if:
[SND.NXT = SND.UNA and]
min(D.U) >= Fs * Max(SND.WND);
(4) or if data is PUSHed and the override timeout occurs.
Here Fs is a fraction whose recommended value is 1/2.

The override timeout should be in the range 0.1 - 1.0

seconds. It may be convenient to combine this timer with

the timer used to probe zero windows (Section 4.2.2.17).
Finally, note that the SWS avoidance algorithm just specified is

to be used instead of the sender-side algorithm contained in [TCP:5].
4.2.3.5 TCP Connection Failures
Excessive retransmission of the same segment by TCP indicates some

failure of the remote host or the Internet path. This failure

may be of short or long duration. The following procedure MUST be

used to handle excessive retransmissions of data segments [IP:11]:
(a) There are two thresholds R1 and R2 measuring the amount

of retransmission that has occurred for the same segment. R1 and R2

might be measured in time units or as a count of retransmissions.
(b) When the number of transmissions of the same segment reaches or

exceeds threshold R1, pass negative advice (see Section 3.3.1.4)

to the IP layer, to trigger dead-gateway diagnosis.
(c) When the number of transmissions of the same segment

reaches a threshold R2 greater than R1, close the connection.
(d) An application MUST be able to set the value for R2 for

a particular connection. For example, an interactive

application might set R2 to "infinity," giving the user

control over when to disconnect.
(d) TCP SHOULD inform the application of the delivery

problem (unless such information has been disabled by

the application; see Section 4.2.4.1), when R1 is

reached and before R2. This will allow a remote login

(User Telnet) application program to inform the user, for example.
The value of R1 SHOULD correspond to at least 3 retransmissions, at the

current RTO. The value of R2 SHOULD correspond to at least 100 seconds.
An attempt to open a TCP connection could fail with

excessive retransmissions of the SYN segment or by receipt

of a RST segment or an ICMP Port Unreachable. SYN retransmissions

MUST be handled in the general way just described for data

retransmissions, including notification of the application layer.
However, the values of R1 and R2 may be different for SYN

and data segments. In particular, R2 for a SYN segment MUST

be set large enough to provide retransmission of the segment

for at least 3 minutes. The application can close the connection

(i.e., give up on the open attempt) sooner, of course.
DISCUSSION:

Some Internet paths have significant setup times, and

the number of such paths is likely to increase in the future.
4.2.3.6 TCP Keep-Alives
Implementors MAY include "keep-alives" in their TCP

implementations, although this practice is not universally

accepted. If keep-alives are included, the application MUST

be able to turn them on or off for each TCP connection, and

they MUST default to off.
Keep-alive packets MUST only be sent when no data or

acknowledgement packets have been received for the

connection within an interval. This interval MUST be

configurable and MUST default to no less than two hours.
It is extremely important to remember that ACK segments that

contain no data are not reliably transmitted by TCP. Consequently,

if a keep-alive mechanism is implemented it MUST NOT interpret

failure to respond to any specific probe as a dead connection.
An implementation SHOULD send a keep-alive segment with no

data; however, it MAY be configurable to send a keep-alive

segment containing one garbage octet, for compatibility with

erroneous TCP implementations.
DISCUSSION:

A "keep-alive" mechanism periodically probes the other

end of a connection when the connection is otherwise

idle, even when there is no data to be sent. The TCP

specification does not include a keep-alive mechanism

because it could: (1) cause perfectly good connections

to break during transient Internet failures; (2)

consume unnecessary bandwidth ("if no one is using the

connection, who cares if it is still good?"); and (3)

cost money for an Internet path that charges for packets.
Some TCP implementations, however, have included a

keep-alive mechanism. To confirm that an idle

connection is still active, these implementations send

a probe segment designed to elicit a response from the

peer TCP. Such a segment generally contains SEG.SEQ =

SND.NXT-1 and may or may not contain one garbage octet

of data. Note that on a quiet connection SND.NXT =

RCV.NXT, so that this SEG.SEQ will be outside the

window. Therefore, the probe causes the receiver to

return an acknowledgment segment, confirming that the

connection is still live. If the peer has dropped the

connection due to a network partition or a crash, it will

respond with a RST instead of an acknowledgment segment.
Unfortunately, some misbehaved TCP implementations fail

to respond to a segment with SEG.SEQ = SND.NXT-1 unless

the segment contains data. Alternatively, an implementation

could determine whether a peer responded correctly to

keep-alive packets with no garbage data octet.
A TCP keep-alive mechanism should only be invoked in

server applications that might otherwise hang indefinitely

and consume resources unnecessarily if a client crashes

or aborts a connection during a network failure.
4.2.3.7 TCP Multihoming
If an application on a multihomed host does not specify the

local IP address when actively opening a TCP connection,

then the TCP MUST ask the IP layer to select a local IP

address before sending the (first) SYN. See the function

GET_SRCADDR() in Section 3.4.
At all other times, a previous segment has either been sent

or received on this connection, and TCP MUST use the same

local address is used that was used in those previous segments.
4.2.3.8 IP Options
When received options are passed up to TCP from the IP

layer, TCP MUST ignore options that it does not understand.
A TCP MAY support the Time Stamp and Record Route options.
An application MUST be able to specify a source route when

it actively opens a TCP connection, and this MUST take

precedence over a source route received in a datagram.
When a TCP connection is OPENed passively and a packet arrives with

a completed IP Source Route option (containing a return route), TCP

MUST save the return route and use it for all segments sent on this

connection. If a different source route arrives in a later segment,

the later definition SHOULD override the earlier one.
4.2.3.9 ICMP Messages
TCP MUST act on an ICMP error message passed up from the IP

layer, directing it to the connection that created the

error. The necessary demultiplexing information can be

found in the IP header contained within the ICMP message.
o Source Quench

TCP MUST react to a Source Quench by slowing

transmission on the connection. The RECOMMENDED

procedure is for a Source Quench to trigger a "slow

start," as if a retransmission timeout had occurred.
o Destination Unreachable -- codes 0, 1, 5

Since these Unreachable messages indicate soft error

conditions, TCP MUST NOT abort the connection, and it

SHOULD make the information available to the application.
DISCUSSION:

TCP could report the soft error condition directly

to the application layer with an upcall to the

ERROR_REPORT routine, or it could merely note the

message and report it to the application only when

and if the TCP connection times out.
o Destination Unreachable -- codes 2-4

These are hard error conditions, so TCP SHOULD abort the connection.
o Time Exceeded -- codes 0, 1

This should be handled the same way as Destination

Unreachable codes 0, 1, 5 (see above).
o Parameter Problem

This should be handled the same way as Destination

Unreachable codes 0, 1, 5 (see above).
4.2.3.10 Remote Address Validation
A TCP implementation MUST reject as an error a local OPEN call for

an invalid remote IP address (e.g., a broadcast or multicast address).
An incoming SYN with an invalid source address must be

ignored either by TCP or by the IP layer (see Section 3.2.1.3).
A TCP implementation MUST silently discard an incoming SYN

segment that is addressed to a broadcast or multicast address.
4.2.3.11 TCP Traffic Patterns
IMPLEMENTATION:

The TCP protocol specification [TCP:1] gives the implementor

much freedom in designing the algorithms that control the

message flow over the connection -- packetizing, managing

the window, sending acknowledgments, etc. These design decisions

are difficult because a TCP must adapt to a wide range of traffic

patterns. Experience has shown that a TCP implementor needs

to verify the design on two extreme traffic patterns:
o Single-character Segments

Even if the sender is using the Nagle Algorithm,

when a TCP connection carries remote login traffic

across a low-delay LAN the receiver will generally

get a stream of single-character segments. If remote

terminal echo mode is in effect, the receiver's system

will generally echo each character as it is received.
o Bulk Transfer

When TCP is used for bulk transfer, the data

stream should be made up (almost) entirely of

segments of the size of the effective MSS.

Although TCP uses a sequence number space with

byte (octet) granularity, in bulk-transfer mode

its operation should be as if TCP used a sequence

space that counted only segments.
Experience has furthermore shown that a single TCP can

effectively and efficiently handle these two extremes.
The most important tool for verifying a new TCP

implementation is a packet trace program. There is a

large volume of experience showing the importance of

tracing a variety of traffic patterns with other TCP

implementations and studying the results carefully.
4.2.3.12 Efficiency
IMPLEMENTATION:

Extensive experience has led to the following

suggestions for efficient implementation of TCP:
(a) Don't Copy Data

In bulk data transfer, the primary CPU-intensive

tasks are copying data from one place to another

and checksumming the data. It is vital to

minimize the number of copies of TCP data. Since the

ultimate speed limitation may be fetching data across

the memory bus, it may be useful to combine the copy

with checksumming, doing both with a single memory fetch.
(b) Hand-Craft the Checksum Routine

A good TCP checksumming routine is typically two

to five times faster than a simple and direct

implementation of the definition. Great care and

clever coding are often required and advisable to

make the checksumming code "blazing fast". See [TCP:10].
(c) Code for the Common Case

TCP protocol processing can be complicated, but

for most segments there are only a few simple

decisions to be made. Per-segment processing will

be greatly speeded up by coding the main line to

minimize the number of decisions in the most common case.
4.2.4 TCP/APPLICATION LAYER INTERFACE
4.2.4.1 Asynchronous Reports
There MUST be a mechanism for reporting soft TCP error

conditions to the application. Generically, we assume this

takes the form of an application-supplied ERROR_REPORT routine that

may be upcalled [INTRO:7] asynchronously from the transport layer:
ERROR_REPORT(local connection name, reason, subreason)
The precise encoding of the reason and subreason parameters

is not specified here. However, the conditions that are

reported asynchronously to the application MUST include:
* ICMP error message arrived (see 4.2.3.9)

* Excessive retransmissions (see 4.2.3.5)

* Urgent pointer advance (see 4.2.2.4).
However, an application program that does not want to receive such

ERROR_REPORT calls SHOULD be able to effectively disable these calls.
DISCUSSION:

These error reports generally reflect soft errors that

can be ignored without harm by many applications. It

has been suggested that these error report calls should

default to "disabled," but this is not required.
4.2.4.2 Type-of-Service
The application layer MUST be able to specify the Type-of-

Service (TOS) for segments that are sent on a connection.

It not required, but the application SHOULD be able to

change the TOS during the connection lifetime. TCP SHOULD

pass the current TOS value without change to the IP layer,

when it sends segments on the connection.
The TOS will be specified independently in each direction on

the connection, so that the receiver application will

specify the TOS used for ACK segments.
TCP MAY pass the most recently received TOS up to the application.
DISCUSSION

Some applications (e.g., SMTP) change the nature of

their communication during the lifetime of a

connection, and therefore would like to change the TOS specification.
Note also that the OPEN call specified in RFC-793 includes

a parameter ("options") in which the caller can specify IP

options such as source route, record route, or timestamp.
4.2.4.3 Flush Call
Some TCP implementations have included a FLUSH call, which

will empty the TCP send queue of any data for which the user

has issued SEND calls but which is still to the right of the

current send window. That is, it flushes as much queued send data

as possible without losing sequence number synchronization. This is

useful for implementing the "abort output" function of Telnet.
4.2.4.4 Multihoming
The user interface outlined in sections 2.7 and 3.8 of RFC-793 needs to

be extended for multihoming. The OPEN call MUST have an optional parameter:
OPEN( ... [local IP address,] ... )
to allow the specification of the local IP address.
DISCUSSION:

Some TCP-based applications need to specify the local IP address

to be used to open a particular connection; FTP is an example.
IMPLEMENTATION:

A passive OPEN call with a specified "local IP address"

parameter will await an incoming connection request to

that address. If the parameter is unspecified, a

passive OPEN will await an incoming connection request

to any local IP address, and then bind the local IP address

of the connection to the particular address that is used.
For an active OPEN call, a specified "local IP address" parameter

will be used for opening the connection. If the parameter is

unspecified, the networking software will choose an appropriate

local IP address (see Section 3.3.4.2) for the connection.
4.2.5 TCP REQUIREMENT SUMMARY
-------------------------------------------------|--------|-|-|-|-|-|--

Push flag | | | | | | |

Aggregate or queue un-pushed data |4.2.2.2 | | |x| | |

Sender collapse successive PSH flags |4.2.2.2 | |x| | | |

SEND call can specify PUSH |4.2.2.2 | | |x| | |

If cannot: sender buffer indefinitely |4.2.2.2 | | | | |x|

If cannot: PSH last segment |4.2.2.2 |x| | | | |

Notify receiving ALP of PSH |4.2.2.2 | | |x| | |1

Send max size segment when possible |4.2.2.2 | |x| | | |

| | | | | | |

Window | | | | | | |

Treat as unsigned number |4.2.2.3 |x| | | | |

Handle as 32-bit number |4.2.2.3 | |x| | | |

Shrink window from right |4.2.2.16| | | |x| |

Robust against shrinking window |4.2.2.16|x| | | | |

Receiver's window closed indefinitely |4.2.2.17| | |x| | |

Sender probe zero window |4.2.2.17|x| | | | |

First probe after RTO |4.2.2.17| |x| | | |

Exponential backoff |4.2.2.17| |x| | | |

Allow window stay zero indefinitely |4.2.2.17|x| | | | |

Sender timeout OK conn with zero wind |4.2.2.17| | | | |x|

| | | | | | |

Urgent Data | | | | | | |

Pointer points to last octet |4.2.2.4 |x| | | | |

Arbitrary length urgent data sequence |4.2.2.4 |x| | | | |

Inform ALP asynchronously of urgent data |4.2.2.4 |x| | | | |1

ALP can learn if/how much urgent data Q'd |4.2.2.4 |x| | | | |1

| | | | | | |

TCP Options | | | | | | |

Receive TCP option in any segment |4.2.2.5 |x| | | | |

Ignore unsupported options |4.2.2.5 |x| | | | |

Cope with illegal option length |4.2.2.5 |x| | | | |

Implement sending & receiving MSS option |4.2.2.6 |x| | | | |

Send MSS option unless 536 |4.2.2.6 | |x| | | |

Send MSS option always |4.2.2.6 | | |x| | |

Send-MSS default is 536 |4.2.2.6 |x| | | | |

Calculate effective send seg size |4.2.2.6 |x| | | | |

| | | | | | |

TCP Checksums | | | | | | |

Sender compute checksum |4.2.2.7 |x| | | | |

Receiver check checksum |4.2.2.7 |x| | | | |

| | | | | | |

Use clock-driven ISN selection |4.2.2.9 |x| | | | |

| | | | | | |

Opening Connections | | | | | | |

Support simultaneous open attempts |4.2.2.10|x| | | | |

SYN-RCVD remembers last state |4.2.2.11|x| | | | |

Passive Open call interfere with others |4.2.2.18| | | | |x|

Function: simultan. LISTENs for same port |4.2.2.18|x| | | | |

Ask IP for src address for SYN if necc. |4.2.3.7 |x| | | | |

Otherwise, use local addr of conn. |4.2.3.7 |x| | | | |

OPEN to broadcast/multicast IP Address |4.2.3.14| | | | |x|

Silently discard seg to bcast/mcast addr |4.2.3.14|x| | | | |

| | | | | | |

Closing Connections | | | | | | |

RST can contain data |4.2.2.12| |x| | | |

Inform application of aborted conn |4.2.2.13|x| | | | |

Half-duplex close connections |4.2.2.13| | |x| | |

Send RST to indicate data lost |4.2.2.13| |x| | | |

In TIME-WAIT state for 2xMSL seconds |4.2.2.13|x| | | | |

Accept SYN from TIME-WAIT state |4.2.2.13| | |x| | |

| | | | | | |

Retransmissions | | | | | | |

Jacobson Slow Start algorithm |4.2.2.15|x| | | | |

Jacobson Congestion-Avoidance algorithm |4.2.2.15|x| | | | |

Retransmit with same IP ident |4.2.2.15| | |x| | |

Karn's algorithm |4.2.3.1 |x| | | | |

Jacobson's RTO estimation alg. |4.2.3.1 |x| | | | |

Exponential backoff |4.2.3.1 |x| | | | |

SYN RTO calc same as data |4.2.3.1 | |x| | | |

Recommended initial values and bounds |4.2.3.1 | |x| | | |

| | | | | | |

Generating ACK's: | | | | | | |

Queue out-of-order segments |4.2.2.20| |x| | | |

Process all Q'd before send ACK |4.2.2.20|x| | | | |

Send ACK for out-of-order segment |4.2.2.21| | |x| | |

Delayed ACK's |4.2.3.2 | |x| | | |

Delay < 0.5 seconds |4.2.3.2 |x| | | | |

Every 2nd full-sized segment ACK'd |4.2.3.2 |x| | | | |

Receiver SWS-Avoidance Algorithm |4.2.3.3 |x| | | | |

| | | | | | |

Sending data | | | | | | |

Configurable TTL |4.2.2.19|x| | | | |

Sender SWS-Avoidance Algorithm |4.2.3.4 |x| | | | |

Nagle algorithm |4.2.3.4 | |x| | | |

Application can disable Nagle algorithm |4.2.3.4 |x| | | | |

| | | | | | |

Connection Failures: | | | | | | |

Negative advice to IP on R1 retxs |4.2.3.5 |x| | | | |

Close connection on R2 retxs |4.2.3.5 |x| | | | |

ALP can set R2 |4.2.3.5 |x| | | | |1

Inform ALP of R1<=retxs
Recommended values for R1, R2 |4.2.3.5 | |x| | | |

Same mechanism for SYNs |4.2.3.5 |x| | | | |

R2 at least 3 minutes for SYN |4.2.3.5 |x| | | | |

| | | | | | |

Send Keep-alive Packets: |4.2.3.6 | | |x| | |

- Application can request |4.2.3.6 |x| | | | |

- Default is "off" |4.2.3.6 |x| | | | |

- Only send if idle for interval |4.2.3.6 |x| | | | |

- Interval configurable |4.2.3.6 |x| | | | |

- Default at least 2 hrs. |4.2.3.6 |x| | | | |

- Tolerant of lost ACK's |4.2.3.6 |x| | | | |

| | | | | | |

IP Options | | | | | | |

Ignore options TCP doesn't understand |4.2.3.8 |x| | | | |

Time Stamp support |4.2.3.8 | | |x| | |

Record Route support |4.2.3.8 | | |x| | |

Source Route: | | | | | | |

ALP can specify |4.2.3.8 |x| | | | |1

Overrides src rt in datagram |4.2.3.8 |x| | | | |

Build return route from src rt |4.2.3.8 |x| | | | |

Later src route overrides |4.2.3.8 | |x| | | |

| | | | | | |

Receiving ICMP Messages from IP |4.2.3.9 |x| | | | |

Dest. Unreach (0,1,5) => inform ALP |4.2.3.9 | |x| | | |

Dest. Unreach (0,1,5) => abort conn |4.2.3.9 | | | | |x|

Dest. Unreach (2-4) => abort conn |4.2.3.9 | |x| | | |

Source Quench => slow start |4.2.3.9 | |x| | | |

Time Exceeded => tell ALP, don't abort |4.2.3.9 | |x| | | |

Param Problem => tell ALP, don't abort |4.2.3.9 | |x| | | |

| | | | | | |

Address Validation | | | | | | |

Reject OPEN call to invalid IP address |4.2.3.10|x| | | | |

Reject SYN from invalid IP address |4.2.3.10|x| | | | |

Silently discard SYN to bcast/mcast addr |4.2.3.10|x| | | | |

| | | | | | |

TCP/ALP Interface Services | | | | | | |

Error Report mechanism |4.2.4.1 |x| | | | |

ALP can disable Error Report Routine |4.2.4.1 | |x| | | |

ALP can specify TOS for sending |4.2.4.2 |x| | | | |

Passed unchanged to IP |4.2.4.2 | |x| | | |

ALP can change TOS during connection |4.2.4.2 | |x| | | |

Pass received TOS up to ALP |4.2.4.2 | | |x| | |

FLUSH call |4.2.4.3 | | |x| | |

Optional local IP addr parm. in OPEN |4.2.4.4 |x| | | | |

-------------------------------------------------|--------|-|-|-|-|-|--

FOOTNOTES:
(1) "ALP" means Application-Layer program.
5. REFERENCES

[INTRO:1] "Requirements for Internet Hosts -- Application and Support,"

IETF Host Requirements Working Group, R. Braden, Ed., RFC-1123, October 1989.

[INTRO:2] "Requirements for Internet Gateways," R. Braden and J. Postel, RFC-1009, June 1987.

[INTRO:3] "DDN Protocol Handbook," NIC-50004, NIC-50005, NIC-50006, SRI International, Dec 1985.

[INTRO:4] "Official Internet Protocols," J. Reynolds and J. Postel, RFC-1011, May 1987.

[INTRO:5] "Protocol Document Order Information," O. Jacobsen and J. Postel, RFC-980, March 1986.

[INTRO:6] "Assigned Numbers," J. Reynolds and J. Postel, RFC-1010, May 1987.

[INTRO:7] "Modularity and Efficiency in Protocol Implementations," D. Clark, RFC-817, July 1982.

[INTRO:8] "The Structuring of Systems Using Upcalls," D. Clark, 10th ACM

SOSP, Orcas Island, Washington, December 1985.

[INTRO:9] "A Protocol for Packet Network Intercommunication," V. Cerf

and R. Kahn, IEEE Transactions on Communication, May 1974.

[INTRO:10]"The ARPA Internet Protocol," J. Postel, C. Sunshine, and D.

Cohen, Computer Networks, Vol. 5, No. 4, July 1981.

[INTRO:11]"The DARPA Internet Protocol Suite," B. Leiner, J. Postel, R. Cole and D. Mills,

Proceedings INFOCOM 85, IEEE, Washington DC, IEEE Communications Magazine, March 1985.

[INTRO:12]"Protocol for Providing the Connectionless Mode Network Service," RFC-994, March 1986.

[INTRO:13]"End System to Intermediate System Routing Exchange Protocol," RFC-995, April 1986.

[LINK:1] "Trailer Encapsulations," S. Leffler and M. Karels, RFC-893, April 1984.

[LINK:2] "An Ethernet Address Resolution Protocol," D. Plummer, RFC-826, November 1982.

[LINK:3] "A Standard for the Transmission of IP Datagrams over Ethernet

Networks," C. Hornig, RFC-894, April 1984.

[LINK:4] "A Standard for the Transmission of IP Datagrams over IEEE 802

"Networks," J. Postel and J. Reynolds, RFC-1042, February 1988.

[IP:1] "Internet Protocol (IP)," J. Postel, RFC-791, September 1981.

[IP:2] "Internet Control Message Protocol (ICMP)," J. Postel, RFC-792, September 1981.

[IP:3] "Internet Standard Subnetting Procedure," J. Mogul and J. Postel, RFC-950, August 1985.

[IP:4] "Host Extensions for IP Multicasting," S. Deering, RFC-1112, August 1989.

[IP:5] "Military Standard Internet Protocol," MIL-STD-1777, Department of Defense, August 1983.

[IP:6] "Some Problems with the Specification of the Military Standard

Internet Protocol," D. Sidhu, RFC-963, November 1985.

[IP:7] "The TCP Maximum Segment Size and Related Topics," J. Postel, RFC-879, November 1983.

[IP:8] "Internet Protocol Security Options," B. Schofield, RFC-1108, October 1989.

[IP:9] "Fragmentation Considered Harmful," C. Kent and J. Mogul, ACM

SIGCOMM-87, August 1987. Published as ACM Comp Comm Review, Vol. 17, no. 5.

[IP:10] "IP Datagram Reassembly Algorithms," D. Clark, RFC-815, July 1982.

[IP:11] "Fault Isolation and Recovery," D. Clark, RFC-816, July 1982.

[IP:12] "Broadcasting Internet Datagrams in the Presence of Subnets," J.Mogul, RFC-922, Oct 1984.

[IP:13] "Name, Addresses, Ports, and Routes," D. Clark, RFC-814, July 1982.

[IP:14] "Something a Host Could Do with Source Quench: The Source Quench

Introduced Delay (SQUID)," W. Prue and J. Postel, RFC-1016, July 1987.

[UDP:1] "User Datagram Protocol," J. Postel, RFC-768, August 1980.

[TCP:1] "Transmission Control Protocol," J. Postel, RFC-793, September 1981.

[TCP:2] "Transmission Control Protocol," MIL-STD-1778, US Department of Defense, August 1984.

[TCP:3] "Some Problems with the Specification of the Military Standard

Transmission Control Protocol," D. Sidhu and T. Blumer, RFC-964, November 1985.

[TCP:4] "The TCP Maximum Segment Size and Related Topics," J. Postel, RFC-879, November 1983.

[TCP:5] "Window and Acknowledgment Strategy in TCP," D. Clark, RFC-813, July 1982.

[TCP:6] "Round Trip Time Estimation," P. Karn & C. Partridge, ACM SIGCOMM-87, August 1987.

[TCP:7] "Congestion Avoidance and Control," V. Jacobson, ACM SIGCOMM-88, August 1988.

[TCP:8] "Modularity and Efficiency in Protocol Implementation," D. Clark, RFC-817, July 1982.

[TCP:9] "Congestion Control in IP/TCP," J. Nagle, RFC-896, January 1984.

[TCP:10] "Computing the Internet Checksum," R. Braden, RFC-1071, Sep 1988.

[TCP:11] "TCP Extensions for Long-Delay Paths," V. Jacobson & R. Braden, RFC-1072, October 1988.

 Наиболее употребляемый термин в Internet-сообществе при обозначении сетевого программно-аппаратного

комплекса (ПАК), имеющего общесетевой IP-адрес. Хостинг (hosting) — предоставление (на возмездной или

безвозмездной основе) компанией (сетевым провайдером) некоторой части ресурсов собственных ПАК

пользователю для размещения последним своих прикладных систем и/или данных.

 В дальнейшем с целью установления строгой однозначности в рамках Internet-архитектуры будут

использоваться термины: “сообщение” — единица данных протокола прикладного уровня; “блок” — единица

данных протокола транспортного уровня; “пакет” — единица данных протокола сетевого (IP-)уровня;

“кадр” — единица данных протокола канального уровня. Если IP-пакет доставляет UDP-блок(дейтаграмму),

то он может назваться “IP-дейтаграммой”.

 Символ “-1” означает, что все биты поля единицы.

1 2 3 4 5 6 7 8 9 10 11