这篇文章主要讲解了“PostgreSQL中RelationGetBufferForTuple函数有什么作用”,文中的讲解内容简单清晰,易于学习与理解,下面请大家跟着小编的思路慢慢深入,一起来研究和学习“PostgreSQL中RelationGetBufferForTuple函数有什么作用”吧!
本节简单介绍了PostgreSQL在执行插入过程中与缓存相关的函数RelationGetBufferForTuple,该函数返回满足空闲空间 >= 给定大小的page,并且该page对应的buffer状态为pinned和并持有独占锁。
一、数据结构
BufferDesc
共享缓冲区的共享描述符(状态)数据
//buffer header锁定
#define BM_LOCKED (1U << 22)
//数据需要写入(标记为DIRTY)
#define BM_DIRTY (1U << 23)
//数据是有效的
#define BM_VALID (1U << 24)
//已分配buffer tag
#define BM_TAG_VALID (1U << 25)
//正在R/W
#define BM_IO_IN_PROGRESS (1U << 26)
//上一个I/O出现错误
#define BM_IO_ERROR (1U << 27)
//开始写则变DIRTY
#define BM_JUST_DIRTIED (1U << 28)
//存在等待sole pin的其他进程
#define BM_PIN_COUNT_WAITER (1U << 29)
//checkpoint发生,必须刷到磁盘上
#define BM_CHECKPOINT_NEEDED (1U << 30)
//持久化buffer(不是unlogged或者初始化fork)
#define BM_PERMANENT (1U << 31)
typedef struct BufferDesc
{
//buffer tag
BufferTag tag;
//buffer索引编号(0开始),指向相应的buffer pool slot
int buf_id;
//tag状态,包括flags/refcount和usagecount
pg_atomic_uint32 state;
//pin-count等待进程ID
int wait_backend_pid;
//空闲链表链中下一个空闲的buffer
int freeNext;
//缓冲区内容锁
LWLock content_lock;
} BufferDesc;
BufferTag
Buffer tag标记了buffer存储的是磁盘中哪个block
typedef struct buftag
{
//物理relation标识符
RelFileNode rnode;
ForkNumber forkNum;
//相对于relation起始的块号
BlockNumber blockNum;
} BufferTag;
二、源码解读
RelationGetBufferForTuple函数返回满足空闲空间>=给定大小的page,并且该page对应的buffer状态为pinned和并持有独占锁
输入:
relation-数据表
len-需要的空间大小
otherBuffer-用于update场景,上一次pinned的buffer
options-处理选项
bistate-BulkInsert标记
vmbuffer-第1个vm(visibilitymap)
vmbuffer_other-用于update场景,上一次pinned的buffer对应的vm(visibilitymap)
注意:
otherBuffer这个参数让人觉得困惑,原因是PG的机制使然
Update时,不是原地更新,而是原数据保留(更新xmax),新数据插入
原数据&新数据如果在不同Block中,锁定Block的时候可能会出现Deadlock
举个例子:Session A更新表T的第一行,第一行在Block 0中,新数据存储在Block 2中
Session B更新表T的第二行,第二行在Block 0中,新数据存储在Block 2中
Block 0/2均要锁定才能完整实现Update操作:
如果Session A先锁定了Block 2,Session B先锁定了Block 0,
然后Session A尝试锁定Block 0,Session B尝试锁定Block 2,这时候就会出现死锁
为了避免这种情况,PG规定锁定时,同一个Relation,按Block的编号顺序锁定,
如需要锁定0和2,那必须先锁定Block 0,再锁定2
输出:
为Tuple分配的Buffer
其主要实现逻辑如下:
1.初始化相关变量
2.获取预留空间
3.如为Update操作,则获取上次pinned buffer对应的Block
4.获取目标page:targetBlock
5.如targetBlock非法,并且使用FSM,则使用FSM寻找
6.如targetBlock仍非法,则循环遍历page检索合适的Block
6.1.读取并独占锁定目标block,以及给定的otherBuffer(如给出)
6.2.获取vm
6.3.读取buffer,判断是否有足够的空闲空间,如足够,则返回
6.4.如仍不足够,则调用RecordAndGetPageWithFreeSpace获取targetBlock,再次循环
7.遍历完毕,仍找不到block,则扩展表
8.扩展表后,以P_NEW模式读取buffer并锁定
9.获取该buffer对应的page,执行相关校验
10.校验不通过报错,校验通过则返回buffer
Buffer
RelationGetBufferForTuple(Relation relation, Size len,
Buffer otherBuffer, int options,
BulkInsertState bistate,
Buffer *vmbuffer, Buffer *vmbuffer_other)
{
bool use_fsm = !(options & HEAP_INSERT_SKIP_FSM);//是否使用FSM寻找空闲空间
Buffer buffer = InvalidBuffer;//
Page page;//
Size pageFreeSpace = 0,//page空闲空间
saveFreeSpace = 0;//page需要预留的空间
BlockNumber targetBlock,//目标Block
otherBlock;//上一次pinned的buffer对应的Block
bool needLock;//是否需要上锁
//大小对齐
len = MAXALIGN(len);
//otherBuffer有效,说明是update操作,不支持bi(BulkInsert)
//bulk操作仅支持插入
Assert(otherBuffer == InvalidBuffer || !bistate);
//#define MaxHeapTupleSize (BLCKSZ - MAXALIGN(SizeOfPageHeaderData + sizeof(ItemIdData)))
//#define MinHeapTupleSize MAXALIGN(SizeofHeapTupleHeader)
if (len > MaxHeapTupleSize)
ereport(ERROR,
(errcode(ERRCODE_PROGRAM_LIMIT_EXCEEDED),
errmsg("row is too big: size %zu, maximum size %zu",
len, MaxHeapTupleSize)));
//获取预留空间
// #define RelationGetTargetPageFreeSpace(relation, defaultff) \
(BLCKSZ * (100 - RelationGetFillFactor(relation, defaultff)) / 100)
saveFreeSpace = RelationGetTargetPageFreeSpace(relation,
HEAP_DEFAULT_FILLFACTOR);
//update操作,获取上次pinned buffer对应的Block
if (otherBuffer != InvalidBuffer)
otherBlock = BufferGetBlockNumber(otherBuffer);
else
otherBlock = InvalidBlockNumber;
if (len + saveFreeSpace > MaxHeapTupleSize)
{
//如果需要的大小+预留空间大于可容纳的最大Tuple大小,不使用FSM,扩展后再尝试
targetBlock = InvalidBlockNumber;
use_fsm = false;
}
else if (bistate && bistate->current_buf != InvalidBuffer)//BulkInsert模式
targetBlock = BufferGetBlockNumber(bistate->current_buf);
else
targetBlock = RelationGetTargetBlock(relation);//普通Insert模式
if (targetBlock == InvalidBlockNumber && use_fsm)
{
//还没有找到合适的BlockNumber,并且需要使用FSM
//使用FSM申请空闲空间=len + saveFreeSpace的块
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
//申请不到,使用最后一个块,否则扩展或者放弃
if (targetBlock == InvalidBlockNumber)
{
BlockNumber nblocks = RelationGetNumberOfBlocks(relation);
if (nblocks > 0)
targetBlock = nblocks - 1;
}
}
loop:
while (targetBlock != InvalidBlockNumber)
{
//---------- 循环直至成功获取插入数据的块号
if (otherBuffer == InvalidBuffer)
{
//----------- 非Update操作
//这种情况比较简单
//获取Buffer
buffer = ReadBufferBI(relation, targetBlock, bistate);
if (PageIsAllVisible(BufferGetPage(buffer)))
//如果Page可见,那么把Page Pin在内存中(Pin的意思是固定/保留)
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);//锁定buffer
}
else if (otherBlock == targetBlock)
{
//----------- Update操作,新记录跟原记录在同一个Block中
//这种情况也比较简单
buffer = otherBuffer;
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
else if (otherBlock < targetBlock)
{
//----------- Update操作,原记录所在的Block < 新记录的Block
//首先锁定otherBlock
buffer = ReadBuffer(relation, targetBlock);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
//优先锁定BlockNumber小的那个
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
}
else
{
//------------ Update操作,原记录所在的Block > 新记录的Block
buffer = ReadBuffer(relation, targetBlock);
if (PageIsAllVisible(BufferGetPage(buffer)))
visibilitymap_pin(relation, targetBlock, vmbuffer);
//优先锁定BlockNumber小的那个
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
}
if (otherBuffer == InvalidBuffer || buffer <= otherBuffer)
GetVisibilityMapPins(relation, buffer, otherBuffer,
targetBlock, otherBlock, vmbuffer,
vmbuffer_other);//Pin VM在内存中
else
GetVisibilityMapPins(relation, otherBuffer, buffer,
otherBlock, targetBlock, vmbuffer_other,
vmbuffer);//Pin VM在内存中
page = BufferGetPage(buffer);
pageFreeSpace = PageGetHeapFreeSpace(page);
if (len + saveFreeSpace <= pageFreeSpace)
{
//有足够的空间存储数据,返回此Buffer
//用这个page作为未来插入的目标page
RelationSetTargetBlock(relation, targetBlock);
return buffer;
}
LockBuffer(buffer, BUFFER_LOCK_UNLOCK);
if (otherBuffer == InvalidBuffer)
ReleaseBuffer(buffer);
else if (otherBlock != targetBlock)
{
LockBuffer(otherBuffer, BUFFER_LOCK_UNLOCK);
ReleaseBuffer(buffer);
}
//不使用FSM定位空闲空间,跳出循环,执行扩展
if (!use_fsm)
break;
//使用FSM获取下一个备选的Block
//注意:如果全部扫描后发现没有满足条件的Block,targetBlock = InvalidBlockNumber,跳出循环
targetBlock = RecordAndGetPageWithFreeSpace(relation,
targetBlock,
pageFreeSpace,
len + saveFreeSpace);
}
//--------- 没有获取满足条件的Block,扩展表
//新创建的数据表或者临时表,无需Lock
needLock = !RELATION_IS_LOCAL(relation);
if (needLock)//需要锁定
{
if (!use_fsm)
//不使用FSM
LockRelationForExtension(relation, ExclusiveLock);
else if (!ConditionalLockRelationForExtension(relation, ExclusiveLock))
{
//不能马上获取锁,等待
LockRelationForExtension(relation, ExclusiveLock);
//如有其它进程扩展了数据表,那么可以成功获取满足条件的targetBlock
targetBlock = GetPageWithFreeSpace(relation, len + saveFreeSpace);
if (targetBlock != InvalidBlockNumber)
{
UnlockRelationForExtension(relation, ExclusiveLock);
goto loop;
}
//其它进程没有扩展
//Just extend it!
RelationAddExtraBlocks(relation, bistate);
}
}
//扩展表后,New Page!
buffer = ReadBufferBI(relation, P_NEW, bistate);
if (otherBuffer != InvalidBuffer)
////otherBuffer的顺序一定在扩展的Block之前,Lock it!
LockBuffer(otherBuffer, BUFFER_LOCK_EXCLUSIVE);
//锁定New Page
LockBuffer(buffer, BUFFER_LOCK_EXCLUSIVE);
if (needLock)
//释放扩展锁
UnlockRelationForExtension(relation, ExclusiveLock);
//获取相应的Page
page = BufferGetPage(buffer);
if (!PageIsNew(page))
//不是New Page,那一定某个地方搞错了!
elog(ERROR, "page %u of relation \"%s\" should be empty but is not",
BufferGetBlockNumber(buffer),
RelationGetRelationName(relation));
//初始化New Page
PageInit(page, BufferGetPageSize(buffer), 0);
//New Page也满足不了要求的大小,报错
if (len > PageGetHeapFreeSpace(page))
{
elog(PANIC, "tuple is too big: size %zu", len);
}
//终于找到了可用于存储数据的Block
RelationSetTargetBlock(relation, BufferGetBlockNumber(buffer));
//返回
return buffer;
}
三、跟踪分析
测试脚本
15:54:13 (xdb@[local]:5432)testdb=# insert into t1 values (1,'1','1');
调用栈
(gdb) b RelationGetBufferForTuple
Breakpoint 1 at 0x4ef179: file hio.c, line 318.
(gdb) c
Continuing.
Breakpoint 1, RelationGetBufferForTuple (relation=0x7f4f51fe39b8, len=32, otherBuffer=0, options=0, bistate=0x0,
vmbuffer=0x7ffea95dbf6c, vmbuffer_other=0x0) at hio.c:318
318 bool use_fsm = !(options & HEAP_INSERT_SKIP_FSM);
(gdb) bt
#0 RelationGetBufferForTuple (relation=0x7f4f51fe39b8, len=32, otherBuffer=0, options=0, bistate=0x0,
vmbuffer=0x7ffea95dbf6c, vmbuffer_other=0x0) at hio.c:318
#1 0x00000000004df1f8 in heap_insert (relation=0x7f4f51fe39b8, tup=0x178a478, cid=0, options=0, bistate=0x0)
at heapam.c:2468
#2 0x0000000000709dda in ExecInsert (mtstate=0x178a220, slot=0x178a680, planSlot=0x178a680, estate=0x1789eb8,
canSetTag=true) at nodeModifyTable.c:529
#3 0x000000000070c475 in ExecModifyTable (pstate=0x178a220) at nodeModifyTable.c:2159
#4 0x00000000006e05cb in ExecProcNodeFirst (node=0x178a220) at execProcnode.c:445
#5 0x00000000006d552e in ExecProcNode (node=0x178a220) at ../../../src/include/executor/executor.h:247
#6 0x00000000006d7d66 in ExecutePlan (estate=0x1789eb8, planstate=0x178a220, use_parallel_mode=false,
operation=CMD_INSERT, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x17a7688,
execute_once=true) at execMain.c:1723
#7 0x00000000006d5af8 in standard_ExecutorRun (queryDesc=0x178e458, direction=ForwardScanDirection, count=0,
execute_once=true) at execMain.c:364
#8 0x00000000006d5920 in ExecutorRun (queryDesc=0x178e458, direction=ForwardScanDirection, count=0, execute_once=true)
at execMain.c:307
#9 0x00000000008c1092 in ProcessQuery (plan=0x16b3ac0, sourceText=0x16b1ec8 "insert into t1 values (1,'1','1');",
params=0x0, queryEnv=0x0, dest=0x17a7688, completionTag=0x7ffea95dc500 "") at pquery.c:161
#10 0x00000000008c29a1 in PortalRunMulti (portal=0x1717488, isTopLevel=true, setHoldSnapshot=false, dest=0x17a7688,
altdest=0x17a7688, completionTag=0x7ffea95dc500 "") at pquery.c:1286
#11 0x00000000008c1f7a in PortalRun (portal=0x1717488, count=9223372036854775807, isTopLevel=true, run_once=true,
dest=0x17a7688, altdest=0x17a7688, completionTag=0x7ffea95dc500 "") at pquery.c:799
#12 0x00000000008bbf16 in exec_simple_query (query_string=0x16b1ec8 "insert into t1 values (1,'1','1');") at postgres.c:1145
#13 0x00000000008c01a1 in PostgresMain (argc=1, argv=0x16dbaf8, dbname=0x16db960 "testdb", username=0x16aeba8 "xdb")
at postgres.c:4182
#14 0x000000000081e07c in BackendRun (port=0x16d3940) at postmaster.c:4361
#15 0x000000000081d7ef in BackendStartup (port=0x16d3940) at postmaster.c:4033
---Type <return> to continue, or q <return> to quit---
#16 0x0000000000819be9 in ServerLoop () at postmaster.c:1706
#17 0x000000000081949f in PostmasterMain (argc=1, argv=0x16acb60) at postmaster.c:1379
#18 0x0000000000742941 in main (argc=1, argv=0x16acb60) at main.c:228
(gdb)
感谢各位的阅读,以上就是“PostgreSQL中RelationGetBufferForTuple函数有什么作用”的内容了,经过本文的学习后,相信大家对PostgreSQL中RelationGetBufferForTuple函数有什么作用这一问题有了更深刻的体会,具体使用情况还需要大家实践验证。这里是亿速云,小编将为大家推送更多相关知识点的文章,欢迎关注!