tech-net archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: npf panic - need some clues



> Date: Thu, 12 Oct 2023 00:09:29 +0000 (UTC)
> From: John Klos <john%klos.com@localhost>
> 
> Does anyone have any clue about what's happening here, and what to check / 
> try in the future?
> 
> [ 2828128.148194] panic: Trap: Data Abort (EL1): Translation Fault L0 with 
> write access for 0000000000000000: pc ffffc00000595da4: stp x27, x19, [x0]

This is a null pointer dereference.

> [ 2828128.331272] fp ffffc00255d37760 stage_mem_gc() at ffffc00000595da4 
> netbsd:stage_mem_gc+0x54

It happened at stage_mem_gc+0x54, which I bet is subr_thmap.c line
933:

    932 	gc = kmem_intr_alloc(sizeof(thmap_gc_t), KM_NOSLEEP);
    933 	gc->addr = addr;
    934 	gc->len = len;

https://nxr.netbsd.org/xref/src/sys/kern/subr_thmap.c?r=1.13#933

This on its face is wrong -- use KM_NOSLEEP, must tolerate allocation
failure.

Unfortunately, it can't be changed to KM_SLEEP instead as it is
currently used; either the algorithm must be changed or the caller
must be reorganized.

> [ 2828128.339342] fp ffffc00255d377d0 thmap_del() at ffffc000005976b0 
> netbsd:thmap_del+0x530
> [ 2828128.339342] fp ffffc00255d378d0 npf_conndb_remove() at 
> ffffc00000344f34 netbsd:npf_conndb_remove+0x40
> [ 2828128.355048] fp ffffc00255d37900 npf_conn_establish() at 
> ffffc00000342a8c netbsd:npf_conn_establish+0x28c
> [ 2828128.364679] fp ffffc00255d37990 npfk_packet_handler() at 
> ffffc0000033a5c4 netbsd:npfk_packet_handler+0x4d4
> [ 2828128.374970] fp ffffc00255d37aa0 pfil_run_hooks() at ffffc0000066c4e0 
> netbsd:pfil_run_hooks+0x110
> [ 2828128.384681] fp ffffc00255d37b50 ipintr() at ffffc000002cd87c 
> netbsd:ipintr+0x318
> [ 2828128.394683] fp ffffc00255d37d00 softint_dispatch() at 
> ffffc000005589a8 netbsd:softint_dispatch+0xf4

Problems:

- thmap_del can't tolerate allocation failure unless the API is
  changed to report back failure itself, but...
- npf_conndb_remove can't handle failure of thmap_del anyway in this
  error branch, so it really needs to block until enough memory is
  freed that the allocation can succeed, but...
- All this logic runs in soft interrupt context where blocking is
  forbidden.

The issue is reported and analyzed here:

https://github.com/rmind/npf/issues/129
https://gnats.netbsd.org/57208

Unfortunately nobody has gotten a round tuit.

(Nothing Arm-specific about this -- it's an npf/thmap bug.)


Home | Main Index | Thread Index | Old Index