Subject: pkg/4960: new patch to python
To: None <gnats-bugs@gnats.netbsd.org>
From: Jaromir Dolecek <dolecek@ics.muni.cz>
List: netbsd-bugs
Date: 02/09/1998 13:10:44
>Number:         4960
>Category:       pkg
>Synopsis:       new patch to python
>Confidential:   no
>Severity:       non-critical
>Priority:       low
>Responsible:    gnats-admin (GNATS administrator)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Mon Feb  9 04:05:02 1998
>Last-Modified:
>Originator:     Jaromir Dolecek
>Organization:
	ICS MU, Brno, Czech Republic
>Release:        1.3
>Environment:
	
System: NetBSD saruman.ics.muni.cz 1.3 NetBSD 1.3 (SARUMAN) #3: Wed Jan 21 13:29:49 MET 1998 dolecek@saruman.ics.muni.cz:/usr/home/dolecek/N13/usr/src/sys/arch/i386/compile/SARUMAN i386


>Description:
	There was found new bug in Python 1.5, related to dealing
	with unsigned/signed chars.
>How-To-Repeat:
	Try to use character 0xff in Python source (legitimate char
	of some national encodings).

>Fix:
	As Guido posted:

> Alexander Voinov <avv@isida.ipa.rssi.ru> writes:
> > Two most popular Russian encodings (koi8 and cp1251) use symbol with code 0x
FF.
> > This appears intolerable both for Python 1.4 and Python 1.5 (EOF).

Martin v. Loewis replies:
> As you might now, a work-around is to use \377 in the string instead.
> I agree that this is not really convenient, and should be fixed.

Agreed.  Here's a patch for Parser/tokenizer.c.  Let me know whether
this works:

--- Parser/tokenizer.c.orig	Tue Apr 29 23:03:03 1997
+++ Parser/tokenizer.c	Mon Feb  9 13:05:34 1998
@@ -46,6 +46,14 @@
 /* Don't ever change this -- it would break the portability of Python code */
 #define TABSIZE 8
 
+/* Convert a possibly signed character to a nonnegative int */
+/* XXX This assumes characters are 8 bits wide */
+#ifdef __CHAR_UNSIGNED__
+#define Py_CHARMASK(c)		(c)
+#else
+#define Py_CHARMASK(c)		((c) & 0xff)
+#endif
+
 /* Forward */
 static struct tok_state *tok_new Py_PROTO((void));
 static int tok_nextc Py_PROTO((struct tok_state *tok));
@@ -178,7 +186,7 @@
 {
 	for (;;) {
 		if (tok->cur != tok->inp) {
-			return *tok->cur++; /* Fast path */
+			return Py_CHARMASK(*tok->cur++); /* Fast path */
 		}
 		if (tok->done != E_OK)
 			return EOF;
@@ -197,7 +205,7 @@
 				tok->buf = tok->cur;
 			tok->lineno++;
 			tok->inp = end;
-			return *tok->cur++;
+			return Py_CHARMASK(*tok->cur++);
 		}
 		if (tok->prompt != NULL) {
 			char *new = PyOS_Readline(tok->prompt);
>Audit-Trail:
>Unformatted: