tech-kern archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: ptrace(2) interface for hardware watchpoints (breakpoints)



[see end]

On 13 December 2016 at 12:16, Kamil Rytarowski <n54%gmx.com@localhost> wrote:
On 13.12.2016 04:12, Valery Ushakov wrote:
> On Tue, Dec 13, 2016 at 02:04:36 +0100, Kamil Rytarowski wrote:
>
>> The design is as follows:
>>
>> 1. Accessors through:
>>  - PT_WRITE_WATCHPOINT - write new watchpoint's state (set, unset, ...),
>>  - PT_READ_WATCHPOINT - read watchpoints's state,
>>  - PT_COUNT_WATCHPOINT - receive the number of available watchpoints.
>
> Gdb supports hardware assisted watchpoints.  That implies that other
> OSes have existing designs for them.  Have you studied those existing
> designs?  Why do you think they are not suitable to be copied?
>

They are based on the concept of exporting debug registers to tracee's
context (machine context/userdata/etc). FreeBSD exposes MD-specific
DBREGS to be set/get by a user, similar with Linux and with MacOSX.

GDB supports hardware and software assisted watchpoints. Software ones
are stepping the code and checking each instruction, hardware ones make
use of the registers.

I propose to export an interface that is not limited to one type of
hardware assisted action, while it can be fully used for hardware
watchpoints (if CPU supports it). This interface will abstract
underlying hardware specific capabilities with a MI ptrace(2) calls (but
MD-specific ptrace_watchpoint structure).

These interfaces are already platform specific and aren't shared between
OSes.


That isn't true (or at least it shouldn't).

While access to the registers is OS specific, the contents of the registers, and their behaviour is not.  Instead, that is specified by the Instruction Set Architecture.
For instance, FreeBSD's gdb/i386fbsd-nat.c uses the generic gdb/x86-nat.c:x86_use_watchpoints() code.

 
Some time ago I checked and IIRC the only two users of these interfaces
were GDB and LLDB, I implied from this that there is no danger from
heavy patching 3rd party software.

I'm not sure how to interpret this.  Is the suggestion that, because there are only two consumers, hacking them both will be easy; or something else?  I hope it isn't.  Taking on such maintenance has a horrendous cost.

Anyway, lets look at the problem space.  It might help to understand why kernel developers tend to throw up their hands.

First lets set the scene:

- if we're lucky we have one hardware watch-point, if we're really lucky there's more than one
- if we're lucky it does something, if we're really lucky it does what the documentation says

which reminds me:

- if we're lucky we've got documentation, if we're really lucky we've correct and up-to-date errata explaining all the hair brained interactions these features have with other hardware events

and now lets consider this simple example, try to watch c.a in:

struct { char c; char a[3]; int32_t i; int64_t j; } c;

Under the proposed model (it looks a lot like gdb's remote protocol's Z packet) it's assumed this will allocate one watch-point:

    address=&c.a, size=3

but wait, the hardware watch-point registers have a few, er, standard features:

- I'll be kind, there are two registers
- size must be power-of-two (lucky size==4 isn't fixed)
- address must be size aligned (lucky addr & 3 == 0 isn't fixed)
- there are separate read/write bits (lucky r+w isn't fixed)

so what to do?  With this hardware we can:

- use two watch-point registers (making your count meaningless), so that accesses only apply to the address in question

- use one watch-point register and over-allocate the address/size and then try to figure out what happened
For writes, a memcmp can help, for reads, well you might be lucky and have a further register with the access address, or unlucky and find yourself disassembling instructions to figure out what the address/size really was

Now, lets consider what happens when the user tries to add:

   &c.j, size=8

depending on where all the balls are (above decision, and the hardware), that may or may not succeed:

  - 32-bit hardware probably limits size<=4, so above would require two registers
  - even if not, &c.a,size=3 may have already used up the two registers

Eww.

Might a better strategy be to first get the registers exposed, and then, if there's still time start to look at an abstract interface?

Andrew




Home | Main Index | Thread Index | Old Index