tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: New class of receive error



    Date:        Wed, 16 May 2018 00:06:53 +0100
    From:        Roy Marples <roy%marples.name@localhost>
    Message-ID:  <c22dfa1a-80ce-bab7-0c1e-204ce1f8dc40%marples.name@localhost>

  | That is not true.
  | It reported the error condition and re-synced state.
  | The fact it had to re-sync state is not optimal in the slightest, so it 
  | logged a diagnostic.

Hmm, the issue was that routes were being "lost" and your comment
about the patch was ...

      It will still report overflow, but the sockets will no longer be closed 
      and dhcpcd will continue as if before.

"The sockets will no longer be closed" - looks like a real issue to me,
and one which would never have been relevant without ENOBUFS
(if you get EBADF or something, closing (re-opening anyway) is the
right thing to do - but none of those should happen.)

  | I've modified at least ifwatchd, wpa_supplicant and maybe racoon and a 
  | handful of packages in pkgsrc to ignore RTM_* messages

I can certainly see that there were buggy routing socket reading apps.

  | This is false.
  |
  | Firstly, you're basing this around the premise that an applications sole 
  | purpose is to read from one socket. 

No, I wasn't - though the "and exit" might imply that.   All I meant was that
(until now) any (unexpected) recv*() error could be regarded as fatal - no
point simply trying again.   That's generally true of all (unexpected) read
type errors.

  | Secondly, you're assuming that just like with RTM_* above, the 
  | application will abort on stuff it isn't exepcting.

I am assuming that since all errors (were) fatal, there was no point
checking.   The only way to get a non-fatal error was to explicitly
ask for it (by setting non-blocking, or catching signals and continuing).
That strategy should be continued.

  | Thirdly, your entire premise stems around the fact that no new error 
  | codes will ever be added to any public funciton which I'm pretty sure 
  | has happened in the past and will happen in the future.

Not any public function - but read() and its recv*() cousins) have been
around a long time, and have never returned unsolicited retriable errors.

  | Good code should cater for the unexpected rather than aborting.

That is a nice theory - but how?   An unknown error can mean anything,
what is code expected to do?   Most code I have seen like this ends
up (eventually) looping forever reporting errors that never go away.
The best thing to do with unknown errors is to abort, so the code can
be fixed, since when the error has occurred, it can be presumed that
at least someone knows what it means.

Unknown packet types in scenarios where all kinds of junk are semt,
and only a few are wanted is a different thing entirely.

  | After all, who knows what the future will bring?

No-one, but that's the point, since we don't know, we cannot write
code to correctly process it.   Simply hoping that "it must be insignificant
if I do not know what it means" is not an intelligent way to code.

  | Yes and yes. I just forgot to note the PR in the dhcpcd import so it's 
  | not logged against the ticket.

Great, and hopefully soon for the 2nd part (I did see your later correction).

kre



Home | Main Index | Thread Index | Old Index