Subject: lib/1739: Berkeley db fails after many writes
To: None <gnats-bugs@NetBSD.ORG>
From: Greg Hudson <ghudson@mit.edu>
List: netbsd-bugs
Date: 11/08/1995 00:14:31
>Number:         1739
>Category:       lib
>Synopsis:       Berkeley db fails after many writes
>Confidential:   no
>Severity:       serious
>Priority:       medium
>Responsible:    lib-bug-people (Library Bug People)
>State:          open
>Class:          sw-bug
>Submitter-Id:   net
>Arrival-Date:   Wed Nov  8 00:35:01 1995
>Last-Modified:
>Originator:     Greg Hudson
>Organization:
	MIT SIPB
>Release:        Some time in September, but this code hasn't changed
>Environment:
System: NetBSD glacier 1.1_ALPHA NetBSD 1.1_ALPHA (ATHENA-AHA) #4: Fri Nov 3 22:55:37 EST 1995 ghudson@lola-granola:/afs/sipb.mit.edu/project/netbsd/dev/ksrc-ghudson/arch/i386/compile/ATHENA-AHA i386


>Description:
	After putting a lot of random data to a db database, Berkeley db
	will eventually return -1 (without setting errno) in response to
	a put.  The writes don't involve large keys, but presumably
	result in a lot of page-splitting.

	The puts are valid and shouldn't fail, and given that they did
	fail, errno should have been set to something other than 0.
>How-To-Repeat:
	Here is a test program from Matt Power which reliably exhibits
	the bug.

/*
 * testput.c
 * db->put test program for NetBSD
 *
 * Matt Power <mhpower@mit.edu>, 7 November 1995
 *
 * Environment: zygorthian-space-raiders.mit.edu
 *    kern.version = NetBSD 1.1_ALPHA (TESTKERN) #1: Sun Nov  5 01:41:38 EST 1995
 *         ghudson@lola-granola:/afs/athena.mit.edu/astaff/project/opssrc/dialup/
 *    netbsd/src/sys/arch/i386/compile/TESTKERN
 *
 * Build command: cc -o testput testput.c
 *
 * Results:
 *    % time ./testput
 *    db->put failed e=-1 i=1735 errno=0
 *    0.754u 9.935s 1:38.70 10.8% 0+0k 2764+2192io 0pf+0w
 *
 * Comment: The put should have returned 0. Also, if it returned -1
 *    for a particular reason, errno should have been nonzero. The
 *    dbopen man page says "Put routines return -1 on error (setting
 *    errno), 0 on success, and 1 if the R_NOOVERWRITE flag was set and
 *    the key already exists in the file."
 */

#include <stdio.h>
#include <sys/file.h>
#include <db.h>
#include <errno.h>

#define KCSIZE 2049
#define NUM_ENTRIES 5000

main()
{
  char keybuf[KCSIZE];
  char databuf[KCSIZE];
  int e, i;
  DB *db;
  DBT key, data;
  static HASHINFO info;
  
  info.bsize = 512;
  info.cachesize = 0;
  info.ffactor = 5;
  info.hash = 0;
  info.lorder = 0;
  info.nelem = 1;

  if ((db = dbopen("testput.db", O_RDWR | O_CREAT | O_TRUNC, 0644,
                    DB_HASH, &info)) == NULL)
    {
      fprintf(stderr, "dbopen failed\n");
      exit(1);
    }
  srandom(17);
  key.data = keybuf;
  data.data = databuf;
  memset(databuf, '\0', KCSIZE);
  for (i = 1; i <= NUM_ENTRIES; i++)
    {
      key.size = 32 + ((i * 3977873) & 511);
      data.size = 32 + (random() & 511);
      memset(keybuf, '\0', KCSIZE);
      sprintf(keybuf, "abcdefgABCDE #%ld", i);
      if (e = (db->put)(db, &key, &data, R_NOOVERWRITE))
        {
          fprintf(stderr, "db->put failed e=%ld i=%ld errno=%ld\n", e, i, errno);
          exit(1);
        }
    }
}

>Fix:
	Unknown.  Margo claims the bug is fixed in the latest alpha
	release of Berkeley db version 2, but according to Matt and
	Ted T'so, the same program fails even less gracefully in the
	latest alpha release.  Also notable is that Berkeley db
	version 2 has incompatible disk format changes.
>Audit-Trail:
>Unformatted: