Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: Starting save/restore for port-xen - initial questions



On Mon, Mar 10, 2008 at 11:38:18PM +0100, Jean-Yves Migeon wrote:
> Hi list,
> 
> As it is my first mail on this list, and, to some extent, the first 
> "big" one, I shall introduce myself: my name is Jean-Yves Migeon, a 24 
> year old french student (currently in Paris), who encountered NetBSD 
> during his studies in his school. I started as a self-taught system 
> administrator for the students network.
> 
> For the sake of curiosity, I started to read books (and a bit of code) 
> dealing with kernel, to gain some understanding about its internals; but 
> consider me a complete kernel noobie.
> 
> The current year in my scholarship has some time reserved for personal 
> work, termed a "project". I asked whether I could use this time to start 
> contributing to NetBSD; it was kindly  accepted. I ended up starting to 
> work for the suspend/resume functionality in port-xen, under the 
> supervision of Manuel Bouyer (bouyer@) and Stoned Elipot (seb@), who I 
> both thank for accepting this proposal.

Well, it's great to have someone working on this :)

> 
> Before jumping right into hacking, I have questions regarding port-xen. 
> Mr Bouyer gave me some pointers to understand the internals involved in 
> Xen (the way it works basically, and its API). However, as it is my 
> first time in kernel coding, and as I am a complete kernel rookie, there 
> are many holes to fill before I can start making some diffing :)
> 
> 
> From what I understand so far, the suspend-save/resume functionality 
> from Xen could be (loosely?) compared to the suspend/resume 
> functionality found on laptops (hibernate and the like):
> - a domU is informed (through xenbus) that it should start preparing for 
> suspend

Actually, it's probably going to be called in
shutdown_xenbus.c:xenbus_shutdown_handler() with reqstr == "suspend".
I guess the right thing here would be to call sysmon_pswitch_event()
with a PSWITCH_TYPE_SLEEP button event, and have the various
xen devices register pmf(9) callbacks.


> - it iterates through all its devices to put them in a suspend state (== 
> putting the frontend drivers in suspend mode, thus flushing the virtual 
> interrupts handlers),

pmf(9) can iterrate for us. The console may need a special handling though
(the same way the console device needs some special handling on a
real box so that it can be used early for printf() on resume - but
I've not though in details about it yet). More specifically, on suspend
a driver needs to:
- stop processing new requests (i.e stop looking at ifp->if_snd for xennet),
  and wait for the dom0 to complete the requests that have been sent
  (pmf should call us from a kernel thread, so we can safelty sleep()
   here).
- once there's no pending requests (i.e. the rings are empty), return
  to pmf that the driver is suspended. If I read the linux sources
  properly there's no more to do at this point with the backend.
  However, it may make things easier to free some ressources from here
  (e.g. interrupt handler).



> - manipulate the event channels accordingly (I guess that putting the 
> virtual drivers into suspend does also affect backend drivers from dom0 
> - console comes to mind),

from what I read, no. Once the domain has been suspended, dom0 will
free the ressources and detach backends, just as if the domU had been
halted or destroyed.

> - save some extra structures, like grant tables, trap handlers, ..., 
> from domU, to restore them properly later. And call HYPERVISOR_suspend().

I think trap handlers are saved/restored by the hypervisor itself as
part of the domU context, but I may be wrong here. grant tables will
need to be unmapped on suspend.

> 
> Rolling these steps backwards would describe the restore process, where 
> the kernel starts again from its last state, while re-establishing the 
> communication with hypervisor.

Yes. Here again pmf(9) should help with this. On resume drivers will need
to get new details from the xenstore, as things like rings, event channels
will have changed. We may even need to unmap/remap the
hypervisor shared page, console page, xenstore page. We also need to
forget machine addresses that may be stored in other places (but there
shoudln't be much).

> 
> Hence, I have some questions. Having extra pointers to areas in 
> /usr/src/sys/ (I am mostly relying on ctags right now...) would be of 
> great help. Note that I am not making any difference between "suspend" 
> and "save", and "resume" and "restore". Please correct me if I am wrong.

this is correct. suspend/resume is the same as save/restore for Xen.
The difference with a real hardware is that we may resume on a different
hardware than one we did suspend (migration).

> - Firstly, what about the structures shared between the domU and 
> hypervisor, which are "context" specific? machine to physical (and their 
> reverse counterpart, physical to machine) mappings come to mind, as 
> there is no warranty that during a restore, physical addresses will be 
> the exact same as before suspend. Which parts of the kernel should it 
> affect (besides VM management code) for domU, and most important, where, 
> in arch/xen? arch/i386? sys/uvm?

I think the hypervisor will take care of updating the p2m and m2p
tables on resume. For other shared infos (xencons_interface,
xenstore_interface, HYPERVISOR_shared_info) this needs to be unmapped
on suspend and remapped on resume (it may actually be easier to
unmap/remap on resume only). At boot these are mapped in
x86/x86_xpmap.c, but for suspend/resume a HYPERVISOR_update_va_mapping()
should be enough.


> 
> - Same question goes for externally dependent mechanisms, like, TCP 
> connections, which will inevitably timeout if we suspend the domain for 
> a long time,

Sure, but it'll be handled by the TCP stack. It's not different from
suspending a laptop.

> and clock syncing (since domains keep track of time 
> independently from others, if I undestood the Xen documentation 
> correctly - the TSC being bound to one VCPU, and thus, to one particular 
> domain)

xen/xen/clock.c will have to update the time of day, and probably reset
its internal state. For TOD update, I suspect the TODR framework will
handle it automatically. For the values used by xen_timer_handler(),
I think they can be updated at the same time we reattach to the
event channel.

> 
> - many files in port-xen already contain code dealing with save and 
> restore operations: xenbus, backend (xbd), grant tables (xengnt), ... 
> Can I use them as reference to understand the key differences between a 
> full domain start and a restore? *_attach() usually calls *_resume() 
> once it has finished its operations (see arch/xen/xen/xbd_xenbus.c:243 
> for example); I guess that this code was mainly tested in a 
> "traditional" boot up phase, and not with a restore operation. If no, 
> feel free to correct me. If yes, did the code using *_resume() land 
> somewhere?

When I wrote these code, I started thinking about suspend/resume and
tried to split the code required to boot appropriately. But for
now it has only been tested for a full domain boot, and I'm not sure
the split between attach and resume is 100% correct. This is one of the
things to look at :)


> 
> - arch/xen/xen/ctrl_if.c: seems to contain some code for controller 
> interface suspend and resume (ctrl_if_suspend() and ctrl_if_resume() ). 
> ctrl_if_suspend() is "#ifdef notyet", how should I interpret this part 
> of the code (see previous question)?

ctrl_if.c is only for Xen2. I think you can forget it, and only look at
suspend/resume for Xen3.

> 
> - arch/i386 has some code regarding initial start up for a Xen domain 
> (arch/i386/i386/machdep.c or vector.S for example). Is there some work 
> already done regarding suspend (like dumping memory and/or manipulating 
> the structures shared between hypervisor and domain), or does it start 
> anew, besides what we can find in arch/xen?

No work done for suspend/resume here at all. But I don't think these
parts should be changed: on a resume, I think the domU comes back with
its MMU and trap tables fully set up.

-- 
Manuel Bouyer <bouyer%antioche.eu.org@localhost>
     NetBSD: 26 ans d'experience feront toujours la difference
--


Home | Main Index | Thread Index | Old Index