在 Windows 安装 Datax:
Datax 官网:https://github.com/alibaba/DataX
环境要求:
JDK(1.8以上,推荐1.8,并配置好环境变量)
Python(网上推荐Python2.7.X,本人Python3.10亲测没问题。并配置好环境变量)
Maven(建议不低于maven3.5,并配置好环境变量)
Datax 安装:
下载地址:https://datax-opensource.oss-cn-hangzhou.aliyuncs.com/202303/datax.tar.gz
下载后解压至本地某个目录,进入bin目录。我本地解压的路径(D:\datax)
测试Datax安装
执行cmd命令进入Datax下的bin目录。我Datax解压到 D:\datax,所以具体进入路径是 D:\datax\bin
cd D:\datax\bin
第一步先输入CHCP 65001命令防止控制台中文乱码,第二步再进行自检脚本命令
CHCP 65001
python .\datax.py ..\job\job.json
会得到如下结果:
DataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2023-06-14 13:09:32.800 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN2023-06-14 13:09:32.803 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]2023-06-14 13:09:32.813 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2023-06-14 13:09:32.818 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.361-b09 jvmInfo: Windows 10 amd64 10.0 cpu num: 8 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB2023-06-14 13:09:32.828 [main] INFO Engine -{ "setting":{ "speed":{ "channel":1 }, "errorLimit":{ "record":0, "percentage":0.02 } }, "content":[ { "reader":{ "name":"streamreader", "parameter":{ "column":[ {"value":"DataX","type":"string" }, {"value":19890604,"type":"long" }, {"value":"1989-06-04 00:00:00","type":"date" }, {"value":true,"type":"bool" }, {"value":"test","type":"bytes" } ], "sliceRecordCount":100000 } }, "writer":{ "name":"streamwriter", "parameter":{ "print":false, "encoding":"UTF-8" } } } ]}2023-06-14 13:09:32.846 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false2023-06-14 13:09:32.846 [main] INFO JobContainer - DataX jobContainer starts job.2023-06-14 13:09:32.847 [main] INFO JobContainer - Set jobId = 02023-06-14 13:09:32.863 [job-0] INFO JobContainer - jobContainer starts to do prepare ...2023-06-14 13:09:32.864 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO JobContainer - jobContainer starts to do split ...2023-06-14 13:09:32.865 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.2023-06-14 13:09:32.865 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.2023-06-14 13:09:32.865 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.2023-06-14 13:09:32.889 [job-0] INFO JobContainer - jobContainer starts to do schedule ...2023-06-14 13:09:32.892 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.2023-06-14 13:09:32.893 [job-0] INFO JobContainer - Running by standalone Mode.2023-06-14 13:09:32.900 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.2023-06-14 13:09:32.904 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.2023-06-14 13:09:32.904 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.2023-06-14 13:09:32.913 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started2023-06-14 13:09:33.240 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms2023-06-14 13:09:33.241 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.2023-06-14 13:09:42.913 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.018s | All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.914 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.2023-06-14 13:09:42.914 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.2023-06-14 13:09:42.914 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.2023-06-14 13:09:42.915 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.2023-06-14 13:09:42.915 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook2023-06-14 13:09:42.917 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2023-06-14 13:09:42.917 [job-0] INFO JobContainer - PerfTrace not enable!2023-06-14 13:09:42.917 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.018s | All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.918 [job-0] INFO JobContainer -任务启动时刻 : 2023-06-14 13:09:32任务结束时刻 : 2023-06-14 13:09:42任务总计耗时 : 10s任务平均流量 : 253.91KB/s记录写入速度 : 10000rec/s读出记录总数 : 100000读写失败总数 : 0
到此,安装完毕。
数据迁移:
MySQL_A 数据库 对 MySQL_B数据库 做数据迁移示例:
MySQL_A 和 MySQL_B 需要提前有相同的数据库表、字段、和字段类型。(如果是其它数据库之间的数据对接,请大家自行百度各数据库之间数据类型映射关系。)
题外话:这里简单的列举下MySQL和达梦数据库的数据类型映射关系,大家仅供参考
MySQL 数据类型 | 达梦数据库数据类型 |
---|---|
INT | INT |
VARCHAR | VARCHAR |
TEXT | CLOB |
DATETIME | TIMESTAMP |
DECIMAL | DECIMAL |
TINYINT | SMALLINT |
回归正题,MySQL_A 和 MySQL_B 都需要有如下 sys_role 表:
CREATE TABLE `sys_role` (
`id` int(11) NOT NULL AUTO_INCREMENT COMMENT 'id-主键 自增长',
`name` varchar(255) NOT NULL COMMENT '角色名称',
`create_user_id` int(11) NOT NULL DEFAULT '0' COMMENT '创建人id',
`create_time` datetime NOT NULL DEFAULT CURRENT_TIMESTAMP COMMENT '创建时间',
`update_user_id` int(11) DEFAULT NULL COMMENT '修改人id',
`update_time` datetime DEFAULT NULL COMMENT '修改时间',
`remark` varchar(255) DEFAULT NULL COMMENT '备注',
PRIMARY KEY (`id`)
) ENGINE=InnoDB AUTO_INCREMENT=2 DEFAULT CHARSET=utf8mb4;
假设 MySQL_A 有如下数据,需要向B做数据迁移
开始用Datax做数据迁移:
进入到datax安装目录的job文件夹下,我的是在D:\datax\job
文件夹内有一个job示例文件,我们参考示例文件新建一个job,此处示例为新建 sys_role.json,文件内容如下:
{ "job": { "content": [ { "reader": { "name": "mysqlreader", // 数据源读取插件名,使用MySQL作为数据源 "parameter": { "username": "root", // MySQL用户名 "password": "root", // MySQL密码 "column": ["*"], // 需要读取的字段列表,这里表示读取所有字段 "connection": [ { "jdbcUrl": ["jdbc:mysql://localhost:3306/Mysql_A"], // MySQL连接URL,指定要读取的数据库为Mysql_A "table": ["sys_role"] // 要读取的表名为sys_role } ] } }, "writer": { "name": "mysqlwriter", // 数据目标写入插件名,使用MySQL作为数据目标 "parameter": { "username": "root", // MySQL用户名 "password": "root", // MySQL密码 "column": ["*"],// 需要写入的字段列表,这里表示写入所有字段 "preSql": ["DELETE FROM sys_role"], // 在写入数据前执行的SQL语句,此处表示先删除sys_role表中的数据 "connection": [ { "jdbcUrl": "jdbc:mysql://localhost:3306/Mysql_B", // MySQL连接URL,指定要写入的数据库为Mysql_B "table": ["sys_role"], // 要写入的表名为sys_role "postSql": [],// 在写入数据后执行的SQL语句,此处为空 "column": ["id", "name", "create_user_id", "create_time","update_user_id", "update_time", "remark"] // 指定要写入的字段列表 } ] } } } ], "setting": { "speed": { "channel": 3 // 并发通道数,表示同时读取和写入的任务数 }, "errorLimit": { "record": 0, // 允许的最大错误记录数,0表示不允许出现错误记录 "percentage": 0.02// 允许的最大错误记录百分比,0.02表示最大允许2%的错误记录 } } }}
执行如下命令,进行数据迁移
python ..\bin\datax.py .\sys_role.json
成功执行后的结果示例:
PS D:\datax\bin> python .\datax.py ..\job\job.jsonDataX (DATAX-OPENSOURCE-3.0), From Alibaba !Copyright (C) 2010-2017, Alibaba Group. All Rights Reserved.2023-06-14 13:09:32.800 [main] INFO MessageSource - JVM TimeZone: GMT+08:00, Locale: zh_CN2023-06-14 13:09:32.803 [main] INFO MessageSource - use Locale: zh_CN timeZone: sun.util.calendar.ZoneInfo[id="GMT+08:00",offset=28800000,dstSavings=0,useDaylight=false,transitions=0,lastRule=null]2023-06-14 13:09:32.813 [main] INFO VMInfo - VMInfo# operatingSystem class => sun.management.OperatingSystemImpl2023-06-14 13:09:32.818 [main] INFO Engine - the machine info => osInfo: Oracle Corporation 1.8 25.361-b09 jvmInfo: Windows 10 amd64 10.0 cpu num: 8 totalPhysicalMemory: -0.00G freePhysicalMemory: -0.00G maxFileDescriptorCount: -1 currentOpenFileDescriptorCount: -1 GC Names [PS MarkSweep, PS Scavenge] MEMORY_NAME | allocation_size | init_size PS Eden Space | 256.00MB | 256.00MB Code Cache | 240.00MB | 2.44MB Compressed Class Space | 1,024.00MB | 0.00MB PS Survivor Space | 42.50MB | 42.50MB PS Old Gen | 683.00MB | 683.00MB Metaspace | -0.00MB | 0.00MB2023-06-14 13:09:32.828 [main] INFO Engine -{ "setting":{ "speed":{ "channel":1 }, "errorLimit":{ "record":0, "percentage":0.02 } }, "content":[ { "reader":{ "name":"streamreader", "parameter":{ "column":[ {"value":"DataX","type":"string" }, {"value":19890604,"type":"long" }, {"value":"1989-06-04 00:00:00","type":"date" }, {"value":true,"type":"bool" }, {"value":"test","type":"bytes" } ], "sliceRecordCount":100000 } }, "writer":{ "name":"streamwriter", "parameter":{ "print":false, "encoding":"UTF-8" } } } ]}2023-06-14 13:09:32.846 [main] INFO PerfTrace - PerfTrace traceId=job_-1, isEnable=false2023-06-14 13:09:32.846 [main] INFO JobContainer - DataX jobContainer starts job.2023-06-14 13:09:32.847 [main] INFO JobContainer - Set jobId = 02023-06-14 13:09:32.863 [job-0] INFO JobContainer - jobContainer starts to do prepare ...2023-06-14 13:09:32.864 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do prepare work .2023-06-14 13:09:32.864 [job-0] INFO JobContainer - jobContainer starts to do split ...2023-06-14 13:09:32.865 [job-0] INFO JobContainer - Job set Channel-Number to 1 channels.2023-06-14 13:09:32.865 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] splits to [1] tasks.2023-06-14 13:09:32.865 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] splits to [1] tasks.2023-06-14 13:09:32.889 [job-0] INFO JobContainer - jobContainer starts to do schedule ...2023-06-14 13:09:32.892 [job-0] INFO JobContainer - Scheduler starts [1] taskGroups.2023-06-14 13:09:32.893 [job-0] INFO JobContainer - Running by standalone Mode.2023-06-14 13:09:32.900 [taskGroup-0] INFO TaskGroupContainer - taskGroupId=[0] start [1] channels for [1] tasks.2023-06-14 13:09:32.904 [taskGroup-0] INFO Channel - Channel set byte_speed_limit to -1, No bps activated.2023-06-14 13:09:32.904 [taskGroup-0] INFO Channel - Channel set record_speed_limit to -1, No tps activated.2023-06-14 13:09:32.913 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] attemptCount[1] is started2023-06-14 13:09:33.240 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] taskId[0] is successed, used[328]ms2023-06-14 13:09:33.241 [taskGroup-0] INFO TaskGroupContainer - taskGroup[0] completed it's tasks.2023-06-14 13:09:42.913 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.018s | All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.914 [job-0] INFO AbstractScheduler - Scheduler accomplished all tasks.2023-06-14 13:09:42.914 [job-0] INFO JobContainer - DataX Writer.Job [streamwriter] do post work.2023-06-14 13:09:42.914 [job-0] INFO JobContainer - DataX Reader.Job [streamreader] do post work.2023-06-14 13:09:42.915 [job-0] INFO JobContainer - DataX jobId [0] completed successfully.2023-06-14 13:09:42.915 [job-0] INFO HookInvoker - No hook invoked, because base dir not exists or is a file: D:\datax\hook2023-06-14 13:09:42.917 [job-0] INFO JobContainer - [total cpu info] => averageCpu | maxDeltaCpu | minDeltaCpu -1.00% | -1.00% | -1.00% [total gc info] => NAME | totalGCCount | maxDeltaGCCount | minDeltaGCCount | totalGCTime | maxDeltaGCTime | minDeltaGCTime PS MarkSweep | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s PS Scavenge | 0 | 0 | 0 | 0.000s | 0.000s | 0.000s2023-06-14 13:09:42.917 [job-0] INFO JobContainer - PerfTrace not enable!2023-06-14 13:09:42.917 [job-0] INFO StandAloneJobContainerCommunicator - Total 100000 records, 2600000 bytes | Speed 253.91KB/s, 10000 records/s | Error 0 records, 0 bytes | All Task WaitWriterTime 0.018s | All Task WaitReaderTime 0.026s | Percentage 100.00%2023-06-14 13:09:42.918 [job-0] INFO JobContainer -任务启动时刻 : 2023-06-14 13:09:32任务结束时刻 : 2023-06-14 13:09:42任务总计耗时 : 10s任务平均流量 : 253.91KB/s记录写入速度 : 10000rec/s读出记录总数 : 100000读写失败总数 : 0
此时,MySQL_B服务,也有了相同的数据,迁移成功。
如果中间发生错误,会对整个事务进行回滚。所以,如果有多个表需要同步,建议拆成多个xxx.json 文件去配置同步。
来源地址:https://blog.csdn.net/qq_38164123/article/details/131205621