NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

kern/55286: Kernel messages cause loss of serial console input



>Number:         55286
>Category:       kern
>Synopsis:       Kernel messages cause loss of serial console input
>Confidential:   no
>Severity:       serious
>Priority:       high
>Responsible:    kern-bug-people
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Fri May 22 19:50:00 +0000 2020
>Originator:     Andreas Gustafsson
>Release:        NetBSD-current and others
>Organization:
>Environment:
System: NetBSD
Architecture: x86_64
Machine: amd64
>Description:

I have a couple of automated test setups where one NetBSD system
controls another over a serial console.  Both are running on real
hardware and the serial baud rate is set to 9600.

For a long time, these systems have been suffering from random
failures that occur when the kernel of the controlled system happens
to log a console message at an inopportune moment.

Sometimes things fail because a kernel message is printed in the
middle of some string that the controlling system is looking for.
For example, if it is waiting for a "login:" prompt but gets
"log[  51.0923743] ipmi0: ID 32.1 IPMI 2.0 Available +SDRs\r\nin:",
there's not much you can do short of eliminating the message.
That case is not what this PR is about.

Rather, this PR is about failures that happen when a kernel message is
printed even if it does _not_ cause any of the expected output to be
corrupted, and specifically, that console _input_ appears to be lost
if it happens to arrive at a time when the kernel is printing a
message.

For example, in a recent run, the controlling system sent a long
one-line shell command beginning with

  set -ex; sleep 60; mkdir -p /b && mount -t tmpfs tmpfs /b && rm -rf /b/* 

This was echoed by the controlled system as

  set -ex; sleep 60; mkdir -p /[  50.9384968] ipmi0: ID 32.1 IPMI 2.0 Available +SDRs\r\nb && m/b \x08 && rm -rf /b/*

Not only did a kernel message appear in the middle of the echoed
command, but more importantly, the "ount -t tmpfs tmpfs " part of
the command was lost.  When the controlled system then tried to
execute the command, it printed

 sh: m/b: not found

so clearly there was a loss of input, not just a loss of echo.

I can also reproduce the issue manually on a freshly booted system by
logging in as root over the serial console and running the shell command

  # for i in $(seq 200); do luactl >/dev/null 2>&1; done

to trigger a series of console messages like

  [  69.6188016] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
  [  69.7088437] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
  [  69.7988866] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
  (etc)

and then typing the letters of the alphabet on the keyboard while the
messages are being printed, followed by a return (at 9600 bps, the
messages print slowly enough that there is time to do this before they
finish).  When I do this, the initial part of the input is lost, but
the rest is accepted, leading to output that ends like this:

  [  88.0675327] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
  # uvxyz
  -sh: uvwxyz: not found

Strangely, this only happens once.  If I repeat the test, every letter
I type is accepted, and I get the expected output ending in

  [ 981.5203980] WARNING: module error: incompatible module class 1 for `lua' (wanted 3)
  # abcdefghijklmnopqrstuvwxyz
  -sh: abcdefghijklmnopqrstuvwxyz: not found

After rebooting, I can again reproduce the problem, once.  I have
tried this test using two different pairs of amd64 machines and they
both behave the same way.

In both the automated test failure and the manual tests, exactly 20
characters were lost, and I see that the magic number 20 also occurs
in com.c as the value of MAX_READAHEAD.  This may or may not be a
coincidence.
-- 
Andreas Gustafsson, gson%gson.org@localhost

>Fix:

Unknown


Home | Main Index | Thread Index | Old Index