Subject: Re: [HACKERS] PostgreSQL, NetBSD and NFS
To: Tom Lane <tgl@sss.pgh.pa.us>
From: D'Arcy J.M. Cain <darcy@druid.net>
List: current-users
Date: 02/05/2003 06:54:17
On Sunday 02 February 2003 12:26, Tom Lane wrote:
> At this point I think you need to rebuild with --enable-debug and
> --enable-cassert (if you didn't already) and then capture some
> stack traces from the stuck backend.  We have to find out what the
> backend thinks it's doing.

Well, it does appear to be working but it never finishes.  Here are two 
backtraces.  One was taken while it was running and the other after a kill 
-9.  The primary key file should have had 322846720 bytes based on the 
database that I was copying in but it only had 4603904 after running the 
restore for 12 hours.  The file seems to get to a static size and just stays 
there.  I am running another test to confirm that.


(gdb) bt
#0  LWLockAcquire (lockid=7272, mode=LW_SHARED) at lwlock.c:236
#1  0x8110417 in LockBuffer (buffer=3626, mode=1) at bufmgr.c:2004
#2  0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=6, access=1) at 
nbtpage.c:321
#3  0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,
    scankey=0x83b90c0, access=1) at nbtsearch.c:159
#4  0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,
    bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#5  0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,
    index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101
#6  0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#7  0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,
    arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328)
    at fmgr.c:1247
#8  0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,
    nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78)
    at indexam.c:193
#9  0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,
    estate=0x83b9a48, is_update=0) at execUtils.c:668
#10 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',
    fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:927
#11 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',
    oids=0 '\000', from=1 '\001', pipe=1 '\001', filename=0x0,
    delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336
#12 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,
    completionTag=0xbfbfcdfc "") at utility.c:341
#13 0x811cc46 in pg_exec_query_string (
    query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
    parse_context=0x83676a0) at postgres.c:766
#14 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,
    username=0x833c525 "darcy") at postgres.c:1926
#15 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#16 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#17 0x8101bbf in ServerLoop () at postmaster.c:995
#18 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#19 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#20 0x8067559 in ___start ()
(gdb) cont
Continuing.

Program received signal SIGKILL, Killed.
0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
lwlock.c:199: No such file or directory.
(gdb) bt
#0  0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
#1  0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=404, access=1)
    at nbtpage.c:321
#2  0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,
    scankey=0x83b90c0, access=1) at nbtsearch.c:159
#3  0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,
    bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#4  0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,
    index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101
#5  0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#6  0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,
    arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328)
    at fmgr.c:1247
#7  0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,
    nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78)
    at indexam.c:193
#8  0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,
    estate=0x83b9a48, is_update=0) at execUtils.c:668
#9  0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',
    fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:927
#10 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',
    oids=0 '\000', from=1 '\001', pipe=1 '\001', filename=0x0,
    delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336
#11 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,
    completionTag=0xbfbfcdfc "") at utility.c:341
#12 0x811cc46 in pg_exec_query_string (
    query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
    parse_context=0x83676a0) at postgres.c:766
#13 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,
    username=0x833c525 "darcy") at postgres.c:1926
#14 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#15 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#16 0x8101bbf in ServerLoop () at postmaster.c:995
#17 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#18 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#19 0x8067559 in ___start ()

>
> BTW: *are* we certain it's associated with NFS, and not a hardware
> problem on your NetBSD box?  Can you perform the same tests running
> the database off a local disk?
>
> 			regards, tom lane

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.