Port-xen archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Starting save/restore for port-xen - initial questions

Hi list,

As it is my first mail on this list, and, to some extent, the first "big" one, I shall introduce myself: my name is Jean-Yves Migeon, a 24 year old french student (currently in Paris), who encountered NetBSD during his studies in his school. I started as a self-taught system administrator for the students network.

For the sake of curiosity, I started to read books (and a bit of code) dealing with kernel, to gain some understanding about its internals; but consider me a complete kernel noobie.

The current year in my scholarship has some time reserved for personal work, termed a "project". I asked whether I could use this time to start contributing to NetBSD; it was kindly accepted. I ended up starting to work for the suspend/resume functionality in port-xen, under the supervision of Manuel Bouyer (bouyer@) and Stoned Elipot (seb@), who I both thank for accepting this proposal.

Before jumping right into hacking, I have questions regarding port-xen. Mr Bouyer gave me some pointers to understand the internals involved in Xen (the way it works basically, and its API). However, as it is my first time in kernel coding, and as I am a complete kernel rookie, there are many holes to fill before I can start making some diffing :)

From what I understand so far, the suspend-save/resume functionality from Xen could be (loosely?) compared to the suspend/resume functionality found on laptops (hibernate and the like): - a domU is informed (through xenbus) that it should start preparing for suspend - it iterates through all its devices to put them in a suspend state (== putting the frontend drivers in suspend mode, thus flushing the virtual interrupts handlers), - manipulate the event channels accordingly (I guess that putting the virtual drivers into suspend does also affect backend drivers from dom0 - console comes to mind), - save some extra structures, like grant tables, trap handlers, ..., from domU, to restore them properly later. And call HYPERVISOR_suspend().

Rolling these steps backwards would describe the restore process, where the kernel starts again from its last state, while re-establishing the communication with hypervisor.

Hence, I have some questions. Having extra pointers to areas in /usr/src/sys/ (I am mostly relying on ctags right now...) would be of great help. Note that I am not making any difference between "suspend" and "save", and "resume" and "restore". Please correct me if I am wrong.

- Firstly, what about the structures shared between the domU and hypervisor, which are "context" specific? machine to physical (and their reverse counterpart, physical to machine) mappings come to mind, as there is no warranty that during a restore, physical addresses will be the exact same as before suspend. Which parts of the kernel should it affect (besides VM management code) for domU, and most important, where, in arch/xen? arch/i386? sys/uvm?

- Same question goes for externally dependent mechanisms, like, TCP connections, which will inevitably timeout if we suspend the domain for a long time, and clock syncing (since domains keep track of time independently from others, if I undestood the Xen documentation correctly - the TSC being bound to one VCPU, and thus, to one particular domain)

- many files in port-xen already contain code dealing with save and restore operations: xenbus, backend (xbd), grant tables (xengnt), ... Can I use them as reference to understand the key differences between a full domain start and a restore? *_attach() usually calls *_resume() once it has finished its operations (see arch/xen/xen/xbd_xenbus.c:243 for example); I guess that this code was mainly tested in a "traditional" boot up phase, and not with a restore operation. If no, feel free to correct me. If yes, did the code using *_resume() land somewhere?

- arch/xen/xen/ctrl_if.c: seems to contain some code for controller interface suspend and resume (ctrl_if_suspend() and ctrl_if_resume() ). ctrl_if_suspend() is "#ifdef notyet", how should I interpret this part of the code (see previous question)?

- arch/i386 has some code regarding initial start up for a Xen domain (arch/i386/i386/machdep.c or vector.S for example). Is there some work already done regarding suspend (like dumping memory and/or manipulating the structures shared between hypervisor and domain), or does it start anew, besides what we can find in arch/xen?

Those questions are kind of general, to understand the concept behind port-xen, and what needs to be done (and to know if I am heading in the right direction, or not).

More will of course follow, while dwelling through the code, thanks to your answers. Apologizes for such a lengthy mail. Others will be concise, I promise :)

Thanking you in advance for your time and help,

Kind regards


Jean-Yves Migeon

Home | Main Index | Thread Index | Old Index