文本:ttt.txt 共7774865条记录
显示部分文本内容如下:
"OBJECT_ID","OBJECT_NAME","CREATED"
"20","ICOL$","2013/10/9 18:23:42"
"46","I_USER1","2013/10/9 18:23:42"
"28","CON$","2013/10/9 18:23:42"
"15","UNDO$","2013/10/9 18:23:42"
"29","C_COBJ#","2013/10/9 18:23:42"
"3","I_OBJ#","2013/10/9 18:23:42"
"25","PROXY_ROLE_DATA$","2013/10/9 18:23:42"
"41","I_IND1","2013/10/9 18:23:42"
"54","I_CDEF2","2013/10/9 18:23:42"
"40","I_OBJ5","2013/10/9 18:23:42"
"26","I_PROXY_ROLE_DATA$_1","2013/10/9 18:23:42"
"17","FILE$","2013/10/9 18:23:42"
"13","UET$","2013/10/9 18:23:42"
"9","I_FILE#_BLOCK#","2013/10/9 18:23:42"
"43","I_FILE1","2013/10/9 18:23:42"
"51","I_CON1","2013/10/9 18:23:42"
"38","I_OBJ3","2013/10/9 18:23:42"
"7","I_TS#","2013/10/9 18:23:42"
"56","I_CDEF4","2013/10/9 18:23:42"
......
......
为了方便测试:
我们选取前10001行,共10000条数据(第一条为字段名)
[11:05:12 wsdf@localhost Desktop]$ head -10001 ttt.txt > test.txt
如下是本人分别用python与shell编写的脚本,不代表脚本执行的方法是最优的,所作的比较仅为个人主观意见
1.Shell版
#!/bin/sh
#mysql_import.sh
#读ttt.txt文件,将其中各项写入数据库
i=0 #
###################设定新分隔符####################
SAVEDIFS=$IFS
IFS=','
#mysql连接函数
mysql_conn(){
mysql -e $1;
}
echo "Begin time: `date`" #显示开始时间
################创建数据库myimport#################
mysql -e "drop database if exists myimport;"
#mysql_conn "drop database if exists myimport;"
mysql -e "create database myimport;"
#mysql_conn "create database myimport;"
#mysql -e "show databases;"
###############对ttt.txt进行处理###################
sed -e 's/\"//g' -e 's#/#-#g' ttt.txt >t.txt
#i=0时,读文本第一行,创建表import_obj,设置i=1
#i=1,读文本其他行,插入数据到import_obj中
#逐行读数据,并通过mysql -e command插入到表中
while read ID NAME CREATED
do
if [ $i -eq 0 ]; then
#echo $ID,$NAME,$CREATED
id=$ID
name=$NAME
created=$CREATED
mysql -e "create table if not exists myimport.import_obj(id int unsigned auto_increment primary key,$ID int unsigned unique not null,$NAME varchar(60) not null,$CREATED datetime not null);"
i=1
#mysql -e "desc myimport.import_obj;"
else
#echo "$CREATED"
#mysql -e "select str_to_date($CREATED,'%Y-%m-%d %H:%i:%s');"
mysql -e "insert into myimport.import_obj($id,$name,$created) value($ID,'$NAME','$CREATED');"
fi
done <t.txt
echo "END time: `date`" #显示结束时间
#mysql -e "select * from myimport.import_obj;"
rm -f t.txt #删除临时文件
IFS=$SAVEDIFS #还原分隔符
2.Python版(Python 2.7.5版本)
#!/usr/bin/python
# mysql_import.py
#使用MySQLdb去实现
import MySQLdb,os,time
os.system("sed -e 's/\"//g' -e 's#/#-#g' test.txt >t.txt");
try:
print time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(time.time())) #记录开始时间
conn=MySQLdb.connect(host='127.0.0.1',user='root',passwd='zhang1992')
print "conn success"
except:
print "conn error!"
exit()
cur=conn.cursor()
cur.execute('drop database if exists myimport')
cur.execute('create database myimport')
i=0
count=0
with open('t.txt','r') as ft:
for row in ft.readlines():
ID,NAME,CREATED=list(row.strip("\n").split(","))
if i==0:
cid=ID
cname=NAME
created=CREATED
cur.execute("create table if not exists myimport.import_obj(id int unsigned auto_increment primary key,%s int unsigned unique not null,%s varchar(60) not null,%s datetime not null)"%(ID,NAME,CREATED))
i=1
else:
sql="insert into myimport.import_obj(%s,%s,%s) value(%d,'%s','%s')"%(cid,cname,created,int(ID),NAME,CREATED)
cur.execute(sql)
os.system("rm -f t.txt")
cur.close()
conn.commit()
conn.close()
print time.strftime("%Y-%m-%d %H:%M:%S",time.localtime(time.time())) #记录结束始时间
3.测试
10000条数据测试
[11:08:32 wsdf@localhost Desktop]$ bash mysql_import.sh
Begin time: Tue Jul 19 11:08:35 CST 2016
End time: Tue Jul 19 11:11:50 CST 2016
[11:11:50 wsdf@localhost Desktop]$ python mysql_import.py
2016-07-19 11:16:46
conn success
2016-07-19 11:16:52
python处理完整文本测试
[13:06:30 wsdf@localhost Desktop]$ python mysql_import.py
2016-07-19 13:06:35
conn success
2016-07-19 13:14:57
[13:14:57 wsdf@localhost Desktop]$ mysql -e "select count(*) from myimport.import_obj";
+----------+
| count(*) |
+----------+
| 774864 |
+----------+
从上面的测试结果可以看出,python的效率明显高于shell。
shell无法保存mysql的连接状态,导致每执行一条插入语句都需要重新连接mysql及断开。这里的python中的MySQLdb通过事务,全部执行完毕才提交,一次提交完成那个所有的插入,节省连接与断开的时间。