gdb跟踪checkpointer进程,出现死锁,Mark一下.
跟踪checkpointer进程,查看共享内存中的信(heckpointerShmem->requests)
(gdb) p CheckpointerShmem->requests[150] ... $16 = {rnode = {spcNode = 1663, dbNode = 16402, relNode = 26185}, forknum = MAIN_FORKNUM, segno = 0} (gdb) p CheckpointerShmem->requests[200] Cannot access memory at address 0xf9fb18 ...
然后,请求checkpoint的进程报错
testdb=# update t_wal_ckpt set c2 = 'C2#'||substr(c2,4,40); UPDATE 8192 testdb=# checkpoint; 2019-01-07 12:30:32.114 CST [1418] PANIC: stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 2019-01-07 12:30:32.114 CST [1418] STATEMENT: checkpoint; 2019-01-07 12:30:37.081 CST [1390] PANIC: stuck spinlock detected at FirstCallSinceLastCheckpoint, checkpointer.c:1376 2019-01-07 12:30:38.610 CST [1370] LOG: background writer process (PID 1390) was terminated by signal 6: Aborted 2019-01-07 12:30:38.610 CST [1370] LOG: terminating any other active server processes 2019-01-07 12:30:38.611 CST [1392] WARNING: terminating connection because of crash of another server process 2019-01-07 12:30:38.611 CST [1392] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-01-07 12:30:38.611 CST [1392] HINT: In a moment you should be able to reconnect to the database and repeat your command. 2019-01-07 12:30:38.613 CST [1558] WARNING: terminating connection because of crash of another server process 2019-01-07 12:30:38.613 CST [1558] DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. 2019-01-07 12:30:38.613 CST [1558] HINT: In a moment you should be able to reconnect to the database and repeat your command. PANIC: stuck spinlock detected at RequestCheckpoint, checkpointer.c:1050 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. The connection to the server was lost. Attempting reset: 2019-01-07 12:30:54.041 CST [1560] FATAL: the database system is in recovery mode Failed. !>
尝试重新连接,发现DB已coredump.
[xdb@localhost ~]$ [xdb@localhost ~]$ psql -d testdb 2019-01-07 14:10:16.114 CST [1629] FATAL: the database system is in recovery mode psql: FATAL: the database system is in recovery mode
执行恢复
[xdb@localhost ~]$ pg_ctl start pg_ctl: another server might be running; trying to start server anyway waiting for server to start....2019-01-07 14:11:50.821 CST [1632] FATAL: lock file "postmaster.pid" already exists 2019-01-07 14:11:50.821 CST [1632] HINT: Is another postmaster (PID 1370) running in data directory "/data/xdb/pg111db"? stopped waiting pg_ctl: could not start server Examine the log output. [xdb@localhost ~]$ find /data/xdb -name postmaster.pid /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ rm -rf /data/xdb/pg111db/postmaster.pid [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:12:44.578 CST [1639] LOG: could not bind IPv6 address "::1": Address already in use [xdb@localhost ~]$ ps -ef|grep postgres xdb 1370 1 0 12:01 pts/0 00:00:02 /appdb/atlasdb/pg11.1/bin/postgres xdb 1389 1370 0 12:01 ? 00:00:00 [postgres] <defunct> xdb 1641 1332 0 14:12 pts/0 00:00:00 grep --color=auto postgres [xdb@localhost ~]$ kill -9 1370 [xdb@localhost ~]$ pg_ctl start waiting for server to start....2019-01-07 14:13:33.125 CST [1648] LOG: listening on IPv6 address "::1", port 5432 2019-01-07 14:13:33.125 CST [1648] LOG: listening on IPv4 address "127.0.0.1", port 5432 2019-01-07 14:13:33.142 CST [1648] LOG: listening on Unix socket "/tmp/.s.PGSQL.5432" .2019-01-07 14:13:34.361 CST [1649] LOG: database system was interrupted; last known up at 2019-01-07 12:26:22 CST 2019-01-07 14:13:34.818 CST [1649] LOG: database system was not properly shut down; automatic recovery in progress 2019-01-07 14:13:34.863 CST [1649] LOG: redo starts at 1/48F9ED08 .2019-01-07 14:13:35.467 CST [1649] LOG: invalid record length at 1/4914FF58: wanted 24, got 0 2019-01-07 14:13:35.467 CST [1649] LOG: redo done at 1/4914FF30 2019-01-07 14:13:35.467 CST [1649] LOG: last completed transaction was at log time 2019-01-07 12:28:37.521542+08 2019-01-07 14:13:35.977 CST [1648] LOG: database system is ready to accept connections done server started
经分析,是因为共享内存结构中的CheckpointerShmem->ckpt_lck导致的.
在跟踪checkpointer进程时,执行
SpinLockRelease(&CheckpointerShmem->ckpt_lck);
释放lock后,不再出现上述问题.
免责声明:
① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的,并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据,供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。
② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341
软考中级精品资料免费领
- 历年真题答案解析
- 备考技巧名师总结
- 高频考点精准押题
- 资料下载
- 历年真题
193.9 KB下载数265
191.63 KB下载数245
143.91 KB下载数1148
183.71 KB下载数642
644.84 KB下载数2756
相关文章
发现更多好内容- Java编程中 abstract 类和方法的详细解析与应用指南(java编程abstract类和方法详解)
- 如何判断 Java 数组中是否存在重复元素?(Java怎么判断数组是否有重复元素)
- Java 中魔法值究竟是什么含义?(java魔法值是什么意思)
- 在 Java 开发中,Javase 究竟扮演着怎样的角色?(Javase在Java开发中扮演什么角色)
- 在 Java 中如何实现 base64 到 blob 的转换?(Java中base64转blob怎么实现)
- Java 中 clazz 类创建的方式有哪些?(java clazz类创建的方式是什么)
- 如何正确使用 Java PersistenceContext 类?实例详解!(Java PersistenceContext类使用实例)
- Java 中 Quartz 框架究竟是什么?(java中quartz是什么框架)
- PHP数据类型转换对存储方式的影响
- Java House 有哪些具体的方法?(Java House的方法有哪些)
猜你喜欢
AI推送时光机PostgreSQL 跟踪checkpointer出现死锁
数据库2024-04-02
PostgreSQL出现死锁该如何解决
数据库2024-04-02
日常SQL数据库死锁跟踪及处理
数据库2024-04-02
mysql出现死锁如何解决
数据库2024-04-02
Google跟踪代码管理器出现404错误
数据库2023-09-26
MySQL中出现死锁的原因有哪些
数据库2024-04-02
Java项目中出现死锁如何解决
数据库2023-05-31
mysql出现死锁的必要条件是什么
数据库2023-05-25
mysql出现死锁的原因及解决方案
数据库2024-04-02
postgresql数据库中出现锁表如何解决
数据库2023-08-30
Java中避免出现死锁的方法有哪些
数据库2023-05-31
怎么写一组会出现死锁的ABAP程序
数据库2023-06-03
解决Java执行Cmd命令出现的死锁问题
数据库2024-04-02
mysql kill进程后出现killed死锁问题及解决
数据库2024-01-29
发现操作系统的数据库出现死锁如何处理
数据库2024-04-02
golang 中合并排序的递归/并行实现中出现死锁
数据库2024-02-10
咦!没有更多了?去看看其它编程学习网 内容吧