Subject: troubleshoot hash(3) database issues
To: None <tech-userlevel@netbsd.org>
From: Jeremy C. Reed <reed@reedmedia.net>
List: tech-userlevel
Date: 08/30/2007 07:29:06
I am using latest spamd from OpenBSD on multiple NetBSD/i386 3.1 systems. 
It uses hash(3).

I continually get corrupted data. Sometimes when spamd reads in the 
database, it says the data size is too large. In its case, it should 
always be 20, but often dbd.size is thousands instead. I added debugging 
to spamd to tell me how bug the dbd.size was.

The strange thing is that I can't reproduce it or see it using the spamdb 
utility to list or db(1) (modified to add my -s switch to show size as 
seen in my other email).

Any ideas on how I can debug this or troubleshoot my spamd db corruption?

I think the db file itself is fine, but the in memory usage of it is 
corrupted.

I compared src/lib/libc/db/hash/ on NetBSD and OpenBSD and saw some 
differences. One thing was that in hash_buf.c, OpenBSD uses memset of 0xff 
and malloc while NetBSD uses calloc (zero). For example:

                /* Allocate a new one */
-               if ((bp = calloc(1, sizeof(BUFHEAD))) == NULL)
+               if ((bp = (BUFHEAD *)malloc(sizeof(BUFHEAD))) == NULL)
                        return (NULL);
-               if ((bp->page = calloc(1, (size_t)hashp->BSIZE)) == NULL) {
+               memset(bp, 0xff, sizeof(BUFHEAD));
+               if ((bp->page = (char *)malloc(hashp->BSIZE)) == NULL) {
                        free(bp);
                        return (NULL);
                }
+               memset(bp->page, 0xff, hashp->BSIZE);


I don't understand purpose of using 0xff instead of 0. Also it does second 
memset.


And OpenBSD does:

                if (do_free) {
-                       if (bp->page)
+                       if (bp->page) {
+                               (void)memset(bp->page, 0, hashp->BSIZE);
                                free(bp->page);
+                       }

But I don't understand value of that.

OpenBSD also "Avoid overwriting the cursor page when the cursor page 
becomes the LRU page":

        bp = LRU;
+
+        /* It is bad to overwrite the page under the cursor. */
+        if (bp == hashp->cpage) {
+                BUF_REMOVE(bp);
+                MRU_INSERT(bp);
+                bp = LRU;
+        }
+
        /*
         * If LRU buffer is pinned, the buffer pool is too small. We need to
         * allocate more buffers.


I don't know if that is related to my corruption. (I may update my libc to 
test with these ideas, but since I can't reproduce my problems manually, I 
have to wait.)

Any ideas on how I can troubleshoot?

  Jeremy C. Reed