NetBSD-Users archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Strange behaviour on PCEngines APU2



Staffan Thomén <duck%shangtai.net@localhost> writes:

> I recently got a PCEngines APU2 (not sure of the exact model) to
> replace my failing Soekris gateway

As Joseph taught Eliza to say, many others have the same sorts of
feelings.

Except that my net5501 was fine, just slow, and I got an apu2d4.

(As an side, pcengines makes really nice hardware and when I asked them
questions about input voltage/current because I want to run an apu from
solar/battery, I got actual answers to my technical questions
immediately from someone who really knew that they were doing.)

> and some strange behaviour appeared after I took it in production.

> After the system has been running for a few hours, it seems to stop
> being able to send packets on the internal wired network interface
> (possibly also the external, I can't tell) on a per-process basis, and
> seems to mostly affect IPv4. ICMP and UDP seems more prone to failure
> than TCP  retransmission?).

This seems really unlikely to be a hardware issue...

> For instance, if I ping a host on my network from the gateway, only a
> few icmp requests go out (checked with tcpdump), sometimes one,
> sometimes ten but then it just sits there. The process seems to be
> stuck in select, if top is to believed.
>
> Attaching a debugger yields;
>
> (gdb) bt
> #0  0x000070e3f803e28a in poll () from /lib/libc.so.12
> #1  0x000000002f003a6f in main ()
>
> Once I quit the debugger, sometimes a few packets get sent (and received) again.
>
> Pressing ctrl-c stops the ping process properly, and it says it sent
> and received 8/8 packets or whatever.

So the issue is ping getting packets into the stack, not the interface,
and none are lost.

What happens if you ping the apu from a host on the lan?

> Disabling pf did nothing.
>
> Packet forwarding seems to work just fine.
>
> I also have a small daemon that I wrote that listens to pflog devices
> that decodes the log and sends the messages to syslog. These also seem
> to stop in the same maner as ping, but in read() in pcap_loop().
>
> Once the system is in this state, it can't reboot itself either,
> presumably waiting something somewhere.

Do you mean "typing shutdown hangs" or also "typing reboot hangs".

> The apu2 is flashed with the latest firmware available, and that made
> no difference.
>
> Since this is a new system, I don't know if it's faulty or if netbsd
> is doing the strange stuff.

When you say "disabling pf", do you mean completely removing all pf
config and freshly booting?

> Advice? I will probably try to roll back my sources to this summer
> sometime and see if an older kernel works, the kernel that was
> optimized for my NET6501 appeared to not have the same problem, but I
> am not sure.

I am running netbsd-8 amd64 on mine, updating every month or so.  I have
seen no issues like you describe.  But surely there are lots of things
different.


  (gdb) bt
  #0  0x000070e3f803e28a in poll () from /lib/libc.so.12
  #1  0x000000002f003a6f in main ()

You might also try "ps alxw" and look at WCHAN.

The other advice I always give is

  netstat -s > BEFORE
  do stuff
  netstat -s > AFTER

  diff -u BEFORE AFTER

  # understand all counters that changed

The point is to notice things you aren't looking for.


Home | Main Index | Thread Index | Old Index