本片文章,我们就来一起学习下这些超时的相关知识。
2 数据库的超时参数有哪些?
总体而言,数据库的超时参数有以下几种:
- 事务超时:transaction timeout;
- 查询超时 query timeout,有时也被称为语句超时 statement timeout;
- 连接超时 connectTimeout:有时也被称为网络超时 NetworkTimeout;
- 登录超时 loginTimeout;
- TCP 套接字超时( TCP socket timeout);
除了以上几个常见的数据库超时参数,还需要注意,客户端的JDBC应用程序和服务端的数据库管理系统,在服务器上都有操作系统级别的基于 tcp keep-alive 的超时检测和保活机制;如果操作系统支持,客户端的JDBC应用程序和服务端的数据库管理系统,甚至也可能配置套接字级别的基于 tcp keep-alive的超时检测和保活机制。
3 事务超时的含义是什么?
事务超时,即 transaction timeout, 可以用来限制某个事务中所有 statement 语句的处理时间之和的最大值,简单来说,事务超时时间 statement timeout = 语句超时时间 statement/query timeout * 事务中语句个数 + 其他耗时(如业务代码处理时间,gc 垃圾回收时间等)
事务超时一般在应用框架中进行配置, 如 spring 中,可以使用注解 @Transactional 指定。
4 查询超时的含义是什么?
查询超时,即 query timeout,有时也被称为语句超时 Statement timeout,可以用来限制某个 statement 语句(可以是增删改查)的最大执行时间,若该 sql语句在该超时时间内还没有返回执行结果,应用端的数据库驱动程序就会抛出超时异常,并发送取消执行的信号给远程的数据库管理系统,由数据库管理系统取消该语句的执行,其底层依赖健康的TCP连接。
- JDBC 提供了标准 API 来指定语句超时:java.sql.Statement.setQueryTimeout(int timeout);
- 不过在实际应用中,大多数开发者都不会通过代码直接指定语句超时,而是使用框架提供的配置机制来指定语句超时;
- 比如 mybatis中,可以通过注解 @defaultStatementTimeout 指定默认的语句超时时间,并在具体的 SQL语句中通过注解 @timeout 覆盖全局的默认值;
- 语句超时的默认值是0,即没有超时时间,具体超时时间的配置,需要根据业务特征进行配置,并没有统一的推荐值;
- 在使用 batch 机制时,该超时时间是适用于某个单独的sql还是该批次所有sql之和,JDBC并没有统一要求,由具体的数据库驱动自行实现;
图片
5 连接超时
连接超时,即 connectTimeout,有时也被称为 网络超时 NetworkTimeout,是驱动程序建立 JDBC 底层的 TCP 连接的超时时间。
- JDBC 标准 API 中定义了该超时时间的含义,如 java.sql.Connection#setNetworkTimeout:
图片
6 登录超时 loginTimeout 的含义是什么?
登录超时,即 loginTimeout,是数据库用户成功登录到数据库服务器的超时时间,由于用户登录数据库服务器时,底层包含了和数据库服务器之间的 tcp 连接的建立,也包含了数据库服务器对用户的认证,所以一般而言,需要配置登录超时 > 连接超时;
- JDBC 标准 API 中定义了登录超时的含义,如java.sql.DriverManager#setLoginTimeout,javax.sql.CommonDataSource#setLoginTimeout;
图片
图片
7 TCP 套接字超时
由于应用程序通过 TCP 协议读写网络数据包,都是通过 TCP/IP 协议栈的 socket api 进行的,所以常规的套接字超时 socket timeout 同样适用于 JDBC 应用程序。虽然 TCP是面向连接的协议,但这里的连接是虚拟的,是动态的,也是不对等的,所以应用程序需要通过 socket timeout 来检测和感知网络层面 TCP 连接的异常,从而避免僵死连接造成的无限等待;(对这块感兴趣的朋友,可以关注笔者对 tcp/ip 协议栈,对 tcpdump/wireshark/packetdrill 工具的相关分享文章);
- 套接字连接建立后,对 socket 中数据的读写操作都是阻塞的(涉及到 CPU 用户态和内核态的切换以及系统调用),套接字超时即是读写 socket 底层数据时的阻塞超时时间;
- 调用 Socket.write()对 socket 进行写操作时,应用通过系统调用将数据传给本地操作系统内核的缓冲区之后就可以立即返回(控制权立即回到应用上),通过网络对底层数据进行远程传输的操作是由操作系统进行的,所以一般应用代码的 socket 写操作很快就会返回,一般不会发生长时间的阻塞(当然如果系统内核缓冲区由于网络故障满了的话,Socket.write()也会进入waiting阻塞状态,此时操作系统会尝试重新发包,当达到重试的次数时就会产生系统异常错误);
- 调用 Socket.read() 对 socket 进行读操作时,由于首先需要通过网络将 socket 底层的远程数据传输到本地,然后才能经由操作系统将底层数据返回给用户态的应用程序,所以一般应用代码的 socket 读操作会消耗一段时间,可能会因为长时间的阻塞而发生超时异常;
- 在网络连接发生异常或服务器异常崩溃宕机时,因为 TCP/IP 的工作机制, socket 无法检测到底层网络的异常,因此应用系统也就无法检测到跟 DBMS 之间的 TCP 连接是否处于断开状态,所以应用端如果没有配置套接字超时,应用就会无期限地等待 DMBS的返回结果(这种连接也被称为死亡连接或僵尸连接 "dead connection.") ,为了避免这种僵尸连接,同样需要配置套接字超时;
8 登录超时,连接超时,TCP 套接字超时的区别与联系
登录超时,连接超时,常规的套接字超时,三者的区别与联系如下:
- 登录超时是高级别的数据库服务层面的超时,而连接超时和套接字超时是低级别的 tcp socket 层面的超时;
- 由于用户登录数据库服务器时,底层包含了 和数据库服务器之间的 tcp 连接的建立,也包含了数据库服务器对用户的认证,所以一般需要配置登录超时 > 连接超时;
- 登录超时和连接超时 只影响客户端和数据库服务器之间的连接的初始建立,而套接字超时 会影响客户端和服务器之间的连接的整个生命周期,包括初始连接的建立,也包括连接建立完毕后所有SQL语句的执行(有些SQL可能需要耗费较长时间),所以一般需要配置套接字超时 > 登录超时,套接字超时 > 连接超时;
- 由于三者联系紧密且都跟网络性能密切相关,在实践中,有些数据库驱动和数据库连接池,可能会只暴露三者中的部分参数供用户配置并基于用户配置的参数值自动推导配置其余的参数(比如 hive 会基于 loginTimeout 自动推导并配置 socketTimeout, 比如 hikari 会基于 connectionTimeout 自动推导并配置 LoginTimeout);
- The loginTimeout specifies how long the whole process of logging into the database is allowed to take. It governs the operation of connecting and authenticating to the dbms server, this involves establishing a TCP connection followed by one or more exchanges of packets for the handshake and authentication to the dbms server;
- The connectTimeout specifies how long to wait for a TCP network connection to get established, it governs the time needed to establish a TCP socket connection, and as establishing a TCP connection is part of establishing a database connection and doesn't guarantee a login, so loginTimeout >= connectTimeout;
- A connection timeout occurs only upon starting the TCP connection. This usually happens if the remote machine does not answer. If you get an ConnectException, possible reasons are: the server has been shut down, you used the wrong IP/DNS name, wrong port or the network connection to the server is down.
- A connection timeout is the maximum amount of time that the program is willing to wait to setup a connection to another process. You aren't getting or posting any application data at this point, just establishing the connection, itself.
- The socketTimeout specifies how long the client will wait for a response to a command from the server before throwing an error, it governs the time a socket can be blocked waiting to read from a socket, this involves all reads from the server, not just during connect, but also during subsequent interaction with the server (eg executing queries),so you may want to set it higher (eg for other operations that take a long time to get a response back) than you are willing to wait for the login to complete;
- A socket timeout is dedicated to monitor the continuous incoming data flow. If the data flow is interrupted for the specified time the connection is considered as stalled/broken. Of course this only works with connections where data is received all the time and there are no delays longer than the configured socket timeout.
- A socket timeout is the timeout when waiting for individual packets. It's a common misconception that a socket timeout is the timeout to receive the full response. So if you have a socket timeout of 1 second, and a response comprised of 3 IP packets, where each response packet takes 0.9 seconds to arrive, for a total response time of 2.7 seconds, then there will be no timeout.
- By setting socket timeout to 1000 (ms) this would require that every second new data is received (assuming that you read the data block wise and the block is large enough).If only the incoming stream stalls for more than a second you are running into a socket timeout.This is especially important when HTTP servers process a complex request that requires some time on server side before the HTTP response data is available. If you configure socket timeout to 10000 (10 seconds) but the HTTP server requires 15 seconds after receiving the HTTP request, then you will never get the response as after 10 seconds you will get an SocketTimeoutException (no data is transmitted between reception of the HTTP request until the HTTP response is ready).
- A socketTimeout can be used as both a brute force global query timeout and a method of detecting network problems;
- the loginTimeout and connectTimeout are related to establishing a connection, while socketTimeout is relevant for the whole database session;
- connectTimeout and socketTimeout are timeouts on low-level socket operations, while loginTimeout is on a high level - the database level;
- Generally, the application hangs from network issues when the application is calling Socket.read(). However, depending on the network composition or the error type, it can rarely be in waiting status while running Socket.write(). When the application calls Socket.write(), the data is recorded to the OS kernel buffer and then the right to control is returned to the application immediately. Thus, as long as a valid value is recorded to the kernel buffer, Socket.write() is always successful. However, if the OS kernel buffer is full due to a special network error, even Socket.write() can be put into waiting status;
9 查询超时的工作机制是什么?
查询超时在不同数据库管理系统和不同驱动下,其工作机制略有不同,但其工作原理是相似的,即大都是通过一个独立的线程来跟踪语句的执行时间,在执行时间超过指定的超时时间时,应用端抛出超时的错误,并通过底层的数据库连接发送取消执行的信号给远程的数据库管理系统,由数据库管理系统取消该语句的执行。
比如 Oracle数据库中,其查询超时的工作机制大体如下:
- 创建待执行 statement:Creates a statement by calling Connection.createStatement();
- 触发执行 statement:Calls Statement.executeQuery();
- 通过 statement 底层的连接将 statement 远程传输给数据库管理系统:The statement transmits the Query to Oracle DBMS by using its own connection.
- 注册该 statement 到超时处理线程 OracleTimeoutPollingThread:The statement registers a statement to OracleTimeoutPollingThread (1 for each classloader) for timeout process.
图片
- 执行时发生了超时:Timeout occurs.
- 超时处理线程调用方法取消语句的执行:OracleTimeoutPollingThread calls OracleStatement.cancel().
图片
- 通过 statement 底层的连接,发送取消执行的信号给远程的数据库管理系统,以取消语句的执行:Sends a cancel message through the connection and cancels the query being executed.
图片
再比如Mysql中,其查询超时的工作机制大体如下:
- 创建待执行 statement: Creates a statement by calling Connection.createStatement().
- 触发执行 statement:Calls Statement.executeQuery().
- 通过 statement 底层的连接远程传输 statement 给数据库管理系统:The statement transmits the Query to MySqlServer by using the internal connection.
- 为每个 statement 创建一个超时处理线程(在 5.1 版本中,更改为为每个连接创建一个超时处理线程):The statement creates a new timeout-execution thread for timeout process;(For version 5.1.x, it changes to assign 1 thread for each connection.)
- 向超时处理线程注册超时处理逻辑:Registers the timeout execution to the thread.
- 执行时发生了超时:Timeout occurs.
- 超时处理线程创建到数据库管理系统的连接:The timeout-execution thread creates a connection that has the same configurations as the statement.
- 超时处理线程通过底层的连接,发送取消执行的信号给远程数据库管理系统以取消语句的执行:Transmits the cancel Query (KILL QUERY "connectionId“) by using the connection.
图片
10 查询超时和 TCP套接字超时有何关系?
我们经常遇到开发同学抱怨,明明对某个SQL语句配置了查询超时,但看起来查询超时就是不生效,其实这种情况是因为底层的网络出了问题,而查询超时机制在网络异常的状况下是不生效的,其原因如下:
- 高层次的异常依赖于低层次的异常,只有在低层次的异常机制正常工作的前提下,高层次的异常机制才能正常工作,所以事务超时和查询超时的正常工作,都依赖于套接字超时的正常运转;
- 查询超时不能用来解决网络异常状况下的超时问题,查询超时仅仅只能用来限制某个语句的执行时间;
- 为应对网络连接的异常或数据库管理系统的异常宕机,需要使用数据库驱动的套接字超时;
- 一般不推荐通过套接字超时限制sql语句的执行时间,套接字超时一般需要配置为比查询超时大一些 (如果套接字超时小于查询超时,此时由于先触发套接字超时的处理,查询超时的处理逻辑也就不会被执行了,查询超时也就失去了意义);
- The higher level timeout is dependent on the lower level timeout. The higher level timeout will operate normally only if the lower level timeout operates normally as well. If the JDBC driver socket timeout does not work properly, then higher level timeouts such as statement timeout and transaction timeout will not work properly either.
- The statement timeout does not handle the timeouts at the time of network failure, it does only one thing: restricts the operation time of 1 statement,and handling timeout to prevent network failure must be done by JDBC Driver;
- Socket timeout value for JDBC driver is necessary when the DBMS is terminated abruptly or an network error has occured (equipment malfunction, etc.).
- Because of the structure of TCP/IP, there are no means for the socket to detect network errors. Therefore, the application cannot detect any disconnection with the DBMS. If the socket timeout is not configured, then the application may wait for the results from the DBMS indefinitely. (This connection is also called a "dead connection."),to prevent dead connections, a timeout must be configured for the socket.
- Socket timeout can be configured via JDBC driver. By setting up the socket timeout, you can prevent the infinite waiting situation when there is a network error and shorten the failure time.
- It is not recommended to use the socket timeout value to limit the statement execution time. So the socket timeout value must be higher than the statement timeout value.
- If the socket timeout value is smaller than the statement timeout value, as the socket timeout will be executed first, and the statement timeout value becomes meaningless and will not be executed.
11 如何配置常见数据库的 TCP 套接字超时?
- 如上文所说,为应对网络连接的异常以及数据库管理系统的崩溃宕机,需要配置数据库驱动的套接字超时;
- 套接字超时在底层又分为创建连接的超时和读写数据的超时两种,究其原因,是由 TCP 的工作机制决定的,包括连接建立的机制以及连接建立完毕后数据传输的机制;
- 套接字的连接超时和读写超时,在JAVA源码层面,分别对应方法 Socket.connect(SocketAddress endpoint, int timeout) 和方法 Socket.setSoTimeout(int timeout):
图片
图片
- 绝大多数数据库驱动都支持对上述两种超时的配置,虽然不同数据库驱动具体的配置方式略有不同,但在驱动代码的最底层都是调用的方法 Socket.connect(SocketAddress endpoint, int timeout) 和方法 Socket.setSoTimeout(int timeout);
下面总结下常见数据库中,套接字连接超时和读写超时的配置方式:
- mysql 可以通过url参数指定套接字的连接超时和读写超时,超时单位是毫秒,如:jdbc:mysql://localhost:3306/ag_admin?useUnicode=true&characterEncoding=UTF8&cnotallow=60000&socketTimeout=60000
- pg 也可以通过url参数指定套接字的连接超时和读写超时,不过超时单位是秒:jdbc:postgresql://localhost/test?user=fred&password=secret&&cnotallow=60&socketTimeout=60
- oracle 的 thin jdbc driver 不支持通过 URL 参数指定套接字的连接超时和读写超时,而是需要通过系统参数 oracle.net.CONNECT_TIMEOUT 和 oracle.jdbc.ReadTimeout 来分别指定,这两个参数的单位都是毫秒,默认值都是0,(读写超时参数,在 10.1.0.5 以下版本的驱动中是 oracle.net.READ_TIMEOUT,在 10.1.0.5 以上的版本中才是 oracle.jdbc.ReadTimeout),比如可以通过 OracleDatasource.setConnectionProperties(java.util.Properties prop) 指定,使用 DBCP 时可以通过 BasicDatasource.setConnectionProperties(java.util.Properties prop)或 BasicDatasource.addConnectionProperties(java.util.Properties prop)指定;
图片
图片
# 配置参数
finalstatic String url= "jdbc:oracle:thin:@myhost:1521/myorcldbservicename";
finalstatic String user = "hr";
finalstatic String password = "hr";
finalstatic String CONNECT_TIMEOUT = "20000";
finalstatic String READ_TIMEOUT = "50000";
# 使用 DataSource 获取连接
Properties connectionProperties = new Properties();
connectionProperties.put(“oracle.net.CONNECT_TIMEOUT”, CONNECT_TIMEOUT);
connectionProperties.put(“oracle.jdbc.ReadTimeout”, READ_TIMEOUT);
OracleDataSource ods = new OracleDataSource();
ods.setURL(url);
ods.setUser(user);
ods.setPassword(password);
ods.setConnectionProperties(connectionProperties);
# 使用 DriverManager 获取连接
Class> oracleDriverClass = Class.forName("oracle.jdbc.driver.OracleDriver");
Properties connectionProperties = new Properties();
connectionProperties.put(“oracle.net.CONNECT_TIMEOUT”, CONNECT_TIMEOUT);
connectionProperties.put(“oracle.jdbc.ReadTimeout”, READ_TIMEOUT);
//也可以通过环境变量/系统参数设置,注意需要在 connection 连接之前设置
//System.setProperty("oracle.net.CONNECT_TIMEOUT", connectTimeout);
//System.setProperty("oracle.jdbc.ReadTimeout", readTimeout);
connectionProperties.put(“user”, user);
connectionProperties.put(“password”, password);
Connection cnotallow=DriverManager.getConnection(url, props);
12 操作系统或TCP套接字级别的 TCP 超时检测机制
除了以上几个常见的数据库超时参数,还需要注意,客户端的JDBC应用程序和服务端的数据库管理系统,在服务器上都有操作系统级别的基于 tcp keep-alive 的超时检测和保活机制;如果操作系统支持,客户端的JDBC应用程序和服务端的数据库管理系统,甚至也可能配置 TCP 套接字级别的基于 tcp keep-alive的超时检测和保活机制。
- 当应用系统没有指定数据库的连接超时和套接字超时时,应用系统大部分情况下都不能有效检测到网络故障。因此,当网络错误发生后,在应用系统成功重新建立连接前或成功读取到数据前,应用系统都会无限制地一直处于等待状态;
- 为避免上述状况,管理员一般会在服务器上配置基于 tcp keep-alive 的超时检测和保活机制,从而在服务器的操作系统层面主动对网络连接进行校验;
- 比如在 linux 服务器上将 tcp-keepalive 的检测间隔配置为30分钟后,在遇到网络问题时,即使应用系统没有在 JDBC 数据库驱动中指定套接字超时(或指定为0),因为网络问题造成的数据库连接问题的持续时间也不会超过30分钟;
- linux下,操作系统级别的套接字超时检测机制,主要跟以下几个内核参数相关,可以通过 sysctl 命令查看和更改这些内核参数;
/proc/sys/net/ipv4/tcp_keepalive_intvl: 默认 75秒,The number of seconds between TCP keep-alive probes;
/proc/sys/net/ipv4/tcp_keepalive_probes: 默认 9 次,The maximum number of TCP keep-alive probes to send before giving up and killing the connection if no response is obtained from the other end;
/proc/sys/net/ipv4/tcp_keepalive_time: 默认 7200 秒即2小时,The number of seconds a connection needs to be idle before TCP begins sending out keep-alive probes. Keep-alives are sent only when the SO_KEEPALIVE socket option is enabled. An idle connection is terminated after approximately an additional 11 minutes (9 probes an interval of 75 seconds apart) when keep-alive is enabled;
- LINUX 操作系统也支持配置套接字级别的基于 tcp keep-alive的超时检测和保活机制,相关 socket 套接字选项如下:
- TCP_KEEPIDLE:the amount of time until the first keepalive packet is sent;
- TCP_KEEPCNT:the number of probes to send;
- TCP_KEEPINTVL:the interval between keepalive packets;
- 具体配置方式跟编程语言提供的 socket api 有关,JDK8/JDK11 通过 jdk.net.ExtendedSocketOptions 提供了对套接字级别的基于 tcp keep-alive的超时检测和保活机制的支持,所以可以如下配置:
import java.net.Socket;
import jdk.net.ExtendedSocketOptions;
Socket socket = new Socket();
socket.setOption(ExtendedSocketOptions.TCP_KEEPIDLE, 10);
socket.setOption(ExtendedSocketOptions.TCP_KEEPCOUNT, 2);
socket.setOption(ExtendedSocketOptions.TCP_KEEPINTERVAL, 3);
- ORACLE 数据库管理系统在 LINUX 操作系统上,在 12c 及以上版本,如果用户配置了sqlnet.ora: SQLNET.EXPIRE_TIME=5,在底层就是通过配置套接字参数 TCP_KEEPIDLE/TCP_KEEPCNT/TCP_KEEPINTVL,在套接字级别实现了基于 tcp keep-alive的超时检测和保活机制;(相关配置参数有:sqlnet.ora: SQLNET.EXPIRE_TIME=5/SQLNET.INBOUND_CONNECT_TIMEOUT=600;listener.ora: INBOUND_CONNECT_TIMEOUT_
=240)
# 查询内核参数
- sysctl -a //显示当前所有可用的内核参数
- sysctl net.ipv4.tcp_keepalive_time //查询某个内核参数
- cat /proc/sys/net/ipv4/tcp_keepalive_time //查询某个内核参数
#修改内核参数
- sysctl net.ipv4.tcp_keepalive_time=3600//修改某个内核参数
- vim /etc/sysctl.conf//在配置文件中修改内核参数
- sysctl -p //从配置文件 sysctl.conf 中重新加载内核参数
- If the socket timeout or the connect timeout is not configured, most of the time, applications cannot detect network errors. So, until the applications are connected or are able to read data, they will wait indefinitely.
- To prevent this, we can configure a socket timeout time at the OS level, so the Linux servers can check the network connection at the OS level.
- If you set the KeepAlive checking cycle for the Linux servers to 30 minutes, then even if someone set the JDBC driver‘s socket timeout to 0, which means no timeout, the DBMS network connection problems caused by network issues do not surpass 30 minutes/The JDBC connection hang recovers 30 minutes after the network connection failure, that is to say, the JDBC driver's socket timeout is affected by the OS's socket timeout configuration.
- Generally, the application hangs from network issues when the application is calling Socket.read(). However, depending on the network composition or the error type, it can rarely be in waiting status while running Socket.write(). When the application calls Socket.write(), the data is recorded to the OS kernel buffer and then the right to control is returned to the application immediately. Thus, as long as a valid value is recorded to the kernel buffer, Socket.write() is always successful. However, if the OS kernel buffer is full due to a special network error, even Socket.write() can be put into waiting status. In thiscase, the OS tries to resend the packet for a certain amount of time, and generates an error when it reaches the limit.
13 相关源码与参考连接
# JDBC API 相关类与方法
java.sql.DriverManager#setLoginTimeout
javax.sql.CommonDataSource#setLoginTimeout
java.sql.Connection#getNetworkTimeout
java.sql.Connection#setNetworkTimeout
java.sql.Statement#setQueryTimeout
# oracle JDBC driver 相关类与方法
oracle.jdbc.OracleDriver
oracle.jdbc.pool.OracleDataSource#setLoginTimeout
oracle.jdbc.OracleConnection
oracle.jdbc.OracleConnection#CONNECTION_PROPERTY_THIN_READ_TIMEOUT
oracle.jdbc.OracleConnection#CONNECTION_PROPERTY_THIN_NET_CONNECT_TIMEOUT。
oracle.jdbc.OracleConnectionWrapper#setNetworkTimeout
oracle.jdbc.driver.PhysicalConnection#setNetworkTimeout
oracle.jdbc.driver.OracleStatement#setQueryTimeout
oracle.jdbc.driver.OracleStatement#doExecuteWithTimeout
oracle.jdbc.driver.OraclePreparedStatement#executeForRowsWithTimeout
oracle.jdbc.driver.OracleTimeoutPollingThread
# mysql JDBC driver 相关类与方法
com.mysql.cj.jdbc.Driver
com.mysql.cj.jdbc.MysqlDataSource#setLoginTimeout
com.mysql.cj.jdbc.ConnectionImpl#setNetworkTimeout
com.mysql.cj.jdbc.ConnectionWrapper#setNetworkTimeout
com.mysql.cj.jdbc.StatementImpl#setQueryTimeout
com.mysql.cj.jdbc.StatementWrapper#setQueryTimeout
#参考链接
- https://www.cubrid.org/blog/3826470
- https://prashantatridba.wordpress.com/tag/tcp_keepidle/
- https://bugs.openjdk.org/browse/JDK-8194298