Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Welcome To Ask or Share your Answers For Others

Categories

0 votes
690 views
in Technique[技术] by (71.8m points)

linux - Getting disconnection notification using TCP Keep-Alive on write blocked socket

I use TCP Keep-Alive option to detect dead connection. It works well with connection that use reading sockets:

setsockopt(mysock,...) // set various keep alive options

epoll_ctl(ep,mysock,{EPOLLIN|EPOLERR|EPOLLHUP},)
epoll_wait -> (exits after several seconds when remove host disconnects cable)

Epoll wait exits with EPOLLIN|EPOLLHUP on socket without a problem.

However if I try to write a lot to socket till I get EAGAIN and then poll for both reading and writing I don't get a error when I disconnect the cable:

setsockopt(mysock,...) // set various keep alive options

while(send() != EAGAIN)
   ;
epoll_ctl(ep,mysock,{EPOLLIN|EPOLLOUT|EPOLERR|EPOLLHUP},)
epoll_wait -> --- Never exits!!!! even when the cable of the remove host is disconnected!!!
  • How can this be solved?
  • Have anybody seen a similar problem?
  • Any possible direction?

Edit: Additional Information

When I monitor the communication with wireshark, in the first case (of reading) I get once in several seconds request for ack. But in the second case I don't detect ones at all.

See Question&Answers more detail:os

与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
Welcome To Ask or Share your Answers For Others

1 Reply

0 votes
by (71.8m points)

If you pull the network connection before all the data is transmitted, then the connection is not idle and thus in some implementations the keepalive timer does not start. (Keep in mind that keepalive is NOT part of the TCP specification and as a result it is implemented inconsistently if at all.) In general, because of the combination of exponential backoff and large number of retries (tcp_retries2 defaults to 15) it can take up to 30 minutes for transmission retries to time out before the keepalive timer starts.

The workaround, if there is one, depends on the particular TCP implementation you are using. Some newer versions of Linux (kernel version 2.6.37 released 4 January, 2011) implement TCP_USER_TIMEOUT. More info here.

The usual recommendation is to implement communication timeouts at the application level rather than use TCP-based keepalive anyway. See, for example, HTTP Keep-Alive.


与恶龙缠斗过久,自身亦成为恶龙;凝视深渊过久,深渊将回以凝视…
OGeek|极客中国-欢迎来到极客的世界,一个免费开放的程序员编程交流平台!开放,进步,分享!让技术改变生活,让极客改变未来! Welcome to OGeek Q&A Community for programmer and developer-Open, Learning and Share
Click Here to Ask a Question

...