NetBSD-Bugs archive

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index][Old Index]

Re: kern/51148: i386 install floppies no longer boot



The following reply was made to PR kern/51148; it has been noted by GNATS.

From: David Holland <dholland-bugs%netbsd.org@localhost>
To: gnats-bugs%netbsd.org@localhost
Cc: Maxime Villard <max%m00nbsd.net@localhost>
Subject: Re: kern/51148: i386 install floppies no longer boot
Date: Tue, 31 May 2016 04:32:32 +0000

 This whole subthread didn't get sent to gnats.
 
    ------
 
 From: Maxime Villard <max%m00nbsd.net@localhost>
 To: Andreas Gustafsson <gson%gson.org@localhost>, netbsd-bugs%netbsd.org@localhost
 Cc: kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Wed, 25 May 2016 13:06:09 +0200
 
 First of all, I'm not subscribed to netbsd-bugs@, so please forward your mails
 to me.
 
 I have carefully investigated the mappings on amd64 and i386 with a kernel page
 explorer I wrote, and there no issue. The levels are all linear, with no holes
 in the middle, they are correctly linked, and they cover the whole kernel image,
 preloaded modules and bootstrap tables.
 
 In fact, there appears to be one bug in the L1 slot that should normally point
 to the first page of the data segment: it seems to be destroyed. But this issue
 was already here before my changes, so I didn't introduce it.
 
 The changes from me you mentioned are all trivial, and it seems highly unlikely
 to me that they cause the install failure. Normally, if there were a bug, it
 should have been in the previous commmits. Also, my changes are in no way
 install-related, and as far as I know, the mappings are the same on
 CD/USB/floppy/whatever.
 
 My guess, right now, is that my alignment changes in kern.ldscript somehow
 trigger the aforementioned L1 slot bug on floppy installs.
 
 I don't have a floppy device, and right now my NetBSD resources are limited. The
 only thing I can do is asking.
 
 	Is the problem still present? (I don't see new entries in the log)
 	We are talking about GENERIC, and not GENERIC-PAE, right?
 	Does reverting only [1] fix the problem?
 	What if you put 'fillkpt' instead of 'fillkpt_nox' in [1]?
 
 Thanks.
 
 
  [1] 2016.05.15.07.17.53 maxv src/sys/arch/i386/i386/locore.S 1.124
 
 
 From: Andreas Gustafsson <gson%gson.org@localhost>
 To: Maxime Villard <max%m00nbsd.net@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Wed, 25 May 2016 15:17:10 +0300
 
 Maxime,
 
 You wrote:
 > First of all, I'm not subscribed to netbsd-bugs@, so please forward your mails
 > to me.
 
 Will do.  I would have mailed you about the initial report if you had
 been the only developer to commit during the period of build breakage
 when the problem appeared, but there were commits by four developers,
 and no easy way for me to determine which of them was at fault.
 
 > I have carefully investigated the mappings on amd64 and i386 with a kernel page
 > explorer I wrote, and there no issue. The levels are all linear, with no holes
 > in the middle, they are correctly linked, and they cover the whole kernel image,
 > preloaded modules and bootstrap tables.
 > 
 > In fact, there appears to be one bug in the L1 slot that should normally point
 > to the first page of the data segment: it seems to be destroyed. But this issue
 > was already here before my changes, so I didn't introduce it.
 > 
 > The changes from me you mentioned are all trivial, and it seems highly unlikely
 > to me that they cause the install failure. Normally, if there were a bug, it
 > should have been in the previous commmits. Also, my changes are in no way
 > install-related, and as far as I know, the mappings are the same on
 > CD/USB/floppy/whatever.
 > 
 > My guess, right now, is that my alignment changes in kern.ldscript somehow
 > trigger the aforementioned L1 slot bug on floppy installs.
 > 
 > I don't have a floppy device, and right now my NetBSD resources are limited.
 
 If you can run misc/py-anita from pkgsrc against an i386 release
 build, that should reproduce the problem without the need for a
 physical floppy device or even a NetBSD host.
 
 > The
 > only thing I can do is asking.
 > 
 > 	Is the problem still present? (I don't see new entries in the log)
 
 Yes, the problem is still present.  I'm not sure what you mean about
 not seeing new entries; the newest test runs are from today, and still
 failing with the same error:
 
   http://releng.netbsd.org/b5reports/i386/commits-2016.05.html#2016.05.25.10.15.01
 
 > 	We are talking about GENERIC, and not GENERIC-PAE, right?
 
 Yes.
 
 > 	Does reverting only [1] fix the problem?
 
 I will try that and report back.
 
 > 	What if you put 'fillkpt' instead of 'fillkpt_nox' in [1]?
 
 I will try that, too.
 
 > Thanks.
 > 
 >   [1] 2016.05.15.07.17.53 maxv src/sys/arch/i386/i386/locore.S 1.124
 -- 
 Andreas Gustafsson, gson%gson.org@localhost
 
 From: Maxime Villard <max%m00nbsd.net@localhost>
 To: Andreas Gustafsson <gson%gson.org@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Wed, 25 May 2016 16:30:58 +0200
 
 Le 25/05/2016 ? 14:17, Andreas Gustafsson a ?crit :
 > [...]
 >> 
 >> I don't have a floppy device, and right now my NetBSD resources are limited.
 > 
 > If you can run misc/py-anita from pkgsrc against an i386 release
 > build, that should reproduce the problem without the need for a
 > physical floppy device or even a NetBSD host.
 > 
 
 I would be happy to do the tests myself. But the only i386 machine I have right
 now is a VirtualBox VM, and there is PR 51134 that reboots the machine every ~5
 minutes. I can do almost nothing on it.
 
 
 From: Christos Zoulas <christos%zoulas.com@localhost>
 To: Maxime Villard <max%m00nbsd.net@localhost>, Andreas Gustafsson <gson%gson.org@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Wed, 25 May 2016 14:17:20 -0400
 
 On May 25,  4:30pm, max%m00nbsd.net@localhost (Maxime Villard) wrote:
 -- Subject: Re: kern/51148: i386 install floppies no longer boot
 
 | Le 25/05/2016 à 14:17, Andreas Gustafsson a écrit :
 | > [...]
 | >>
 | >> I don't have a floppy device, and right now my NetBSD resources are limited.
 | >
 | > If you can run misc/py-anita from pkgsrc against an i386 release
 | > build, that should reproduce the problem without the need for a
 | > physical floppy device or even a NetBSD host.
 | >
 | 
 | I would be happy to do the tests myself. But the only i386 machine I have right
 | now is a VirtualBox VM, and there is PR 51134 that reboots the machine every ~5
 | minutes. I can do almost nothing on it.
 
 I am fixing that.
 
 christos
 
 
 From: Andreas Gustafsson <gson%gson.org@localhost>
 To: Maxime Villard <max%m00nbsd.net@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Wed, 25 May 2016 20:06:11 +0300
 
 Maxime,
 
 I have now run the tests you asked for.
 
 > 	Does reverting only [1] fix the problem?
 
 Yes.  The system still doesn't install because the kernel is unable
 to exec /sbin/init, but this is a different bug; when I don't revert
 [1], the kernel does not even start (there are no kernel messages
 on the console).
 
 > 	What if you put 'fillkpt' instead of 'fillkpt_nox' in [1]?
 
 I tested with this patch against 2016.05.22.09.10.37 sources:
 
 diff -u -r1.124 locore.S
 --- locore.S    15 May 2016 07:17:53 -0000      1.124
 +++ locore.S    25 May 2016 14:33:35 -0000
 @@ -731,7 +731,7 @@
         movl    RELOC(tablesize),%ecx   /* length of BOOTSTRAP TABLES */
         shrl    $PGSHIFT,%ecx
         orl     $(PG_V|PG_KW),%eax
 -       fillkpt_nox
 +       fillkpt
  
         /* We are on (4). Map ISA I/O mem (later atdevbase) RWX. */
         movl    $(IOM_BEGIN|PG_V|PG_KW/*|PG_N*/),%eax
 
 and it did _not_ fix the problem.
 
 Later, you wrote:
 
 > I would be happy to do the tests myself. But the only i386 machine I have right
 > now is a VirtualBox VM, and there is PR 51134 that reboots the machine every ~5
 > minutes. I can do almost nothing on it.
 
 What do you host VirtualBox on?  You can test the i386 port using anita+qemu
 even on a non-i386 host.
 -- 
 Andreas Gustafsson, gson%gson.org@localhost
 
 From: Maxime Villard <max%m00nbsd.net@localhost>
 To: Andreas Gustafsson <gson%gson.org@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Thu, 26 May 2016 09:33:52 +0200
 
 I've committed a patch. Please let me know whether it fixes the issue.
 
 
 From: Maxime Villard <max%m00nbsd.net@localhost>
 To: Andreas Gustafsson <gson%gson.org@localhost>
 Cc: netbsd-bugs%netbsd.org@localhost, kern-bug-people%netbsd.org@localhost, gnats-admin%netbsd.org@localhost
 Subject: Re: kern/51148: i386 install floppies no longer boot
 Date: Thu, 26 May 2016 09:00:44 +0200
 
 Le 25/05/2016 ? 19:06, Andreas Gustafsson a ?crit :
 > Maxime,
 > 
 > I have now run the tests you asked for.
 > 
 >> 	Does reverting only [1] fix the problem?
 > 
 > Yes.  The system still doesn't install because the kernel is unable
 > to exec /sbin/init, but this is a different bug; when I don't revert
 > [1], the kernel does not even start (there are no kernel messages
 > on the console).
 > 
 >> 	What if you put 'fillkpt' instead of 'fillkpt_nox' in [1]?
 > 
 > I tested with this patch against 2016.05.22.09.10.37 sources:
 > 
 > diff -u -r1.124 locore.S
 > --- locore.S    15 May 2016 07:17:53 -0000      1.124
 > +++ locore.S    25 May 2016 14:33:35 -0000
 > @@ -731,7 +731,7 @@
 >         movl    RELOC(tablesize),%ecx   /* length of BOOTSTRAP TABLES */
 >         shrl    $PGSHIFT,%ecx
 >         orl     $(PG_V|PG_KW),%eax
 > -       fillkpt_nox
 > +       fillkpt
 > 
 >         /* We are on (4). Map ISA I/O mem (later atdevbase) RWX. */
 >         movl    $(IOM_BEGIN|PG_V|PG_KW/*|PG_N*/),%eax
 > 
 > and it did _not_ fix the problem.
 
 Thanks for the tests. I see where the problem is, and I'll commit a patch
 soon.
 
 > 
 > Later, you wrote:
 > 
 >> I would be happy to do the tests myself. But the only i386 machine I have right
 >> now is a VirtualBox VM, and there is PR 51134 that reboots the machine every ~5
 >> minutes. I can do almost nothing on it.
 > 
 > What do you host VirtualBox on?  You can test the i386 port using anita+qemu
 > even on a non-i386 host.
 > 
 
 I'll answer in the other PR.
 
 


Home | Main Index | Thread Index | Old Index