Subject: Re: [HACKERS] PostgreSQL, NetBSD and NFS
To: Tom Lane <tgl@sss.pgh.pa.us>
From: D'Arcy J.M. Cain <darcy@druid.net>
List: current-users
Date: 02/05/2003 06:54:17
On Sunday 02 February 2003 12:26, Tom Lane wrote:
> At this point I think you need to rebuild with --enable-debug and
> --enable-cassert (if you didn't already) and then capture some
> stack traces from the stuck backend. We have to find out what the
> backend thinks it's doing.
Well, it does appear to be working but it never finishes. Here are two
backtraces. One was taken while it was running and the other after a kill
-9. The primary key file should have had 322846720 bytes based on the
database that I was copying in but it only had 4603904 after running the
restore for 12 hours. The file seems to get to a static size and just stays
there. I am running another test to confirm that.
(gdb) bt
#0 LWLockAcquire (lockid=7272, mode=LW_SHARED) at lwlock.c:236
#1 0x8110417 in LockBuffer (buffer=3626, mode=1) at bufmgr.c:2004
#2 0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=6, access=1) at
nbtpage.c:321
#3 0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,
scankey=0x83b90c0, access=1) at nbtsearch.c:159
#4 0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,
bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#5 0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,
index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101
#6 0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#7 0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,
arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328)
at fmgr.c:1247
#8 0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,
nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78)
at indexam.c:193
#9 0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,
estate=0x83b9a48, is_update=0) at execUtils.c:668
#10 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',
fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:927
#11 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',
oids=0 '\000', from=1 '\001', pipe=1 '\001', filename=0x0,
delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336
#12 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,
completionTag=0xbfbfcdfc "") at utility.c:341
#13 0x811cc46 in pg_exec_query_string (
query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
parse_context=0x83676a0) at postgres.c:766
#14 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,
username=0x833c525 "darcy") at postgres.c:1926
#15 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#16 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#17 0x8101bbf in ServerLoop () at postmaster.c:995
#18 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#19 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#20 0x8067559 in ___start ()
(gdb) cont
Continuing.
Program received signal SIGKILL, Killed.
0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
lwlock.c:199: No such file or directory.
(gdb) bt
#0 0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
#1 0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=404, access=1)
at nbtpage.c:321
#2 0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,
scankey=0x83b90c0, access=1) at nbtsearch.c:159
#3 0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,
bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#4 0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,
index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101
#5 0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#6 0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,
arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328)
at fmgr.c:1247
#7 0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,
nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78)
at indexam.c:193
#8 0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,
estate=0x83b9a48, is_update=0) at execUtils.c:668
#9 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',
fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:927
#10 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',
oids=0 '\000', from=1 '\001', pipe=1 '\001', filename=0x0,
delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336
#11 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,
completionTag=0xbfbfcdfc "") at utility.c:341
#12 0x811cc46 in pg_exec_query_string (
query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
parse_context=0x83676a0) at postgres.c:766
#13 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,
username=0x833c525 "darcy") at postgres.c:1926
#14 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#15 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#16 0x8101bbf in ServerLoop () at postmaster.c:995
#17 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#18 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#19 0x8067559 in ___start ()
>
> BTW: *are* we certain it's associated with NFS, and not a hardware
> problem on your NetBSD box? Can you perform the same tests running
> the database off a local disk?
>
> regards, tom lane
--
D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves
http://www.druid.net/darcy/ | and a sheep voting on
+1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.