基准评测工具
安装测试工具
三个重要考虑方面:数据如何生成,测量指标是什么,负载怎么测
首先下载TPC-H,修改
makefile
文件,在103行左右:# Current values for WORKLOAD are: TPCH DATABASE= TDAT MACHINE = LINUX WORKLOAD = TPCH
修改完成后直接运行
make clean
和make
进行编译测试数据生成:我们需要自动创建测试数据。编译之后会生成
dbgen
执行文件,用TPC-H
即可方便地生成自定义大小的测试数据。在这里以0.1GB数据为例,采用命令./dbgen –s 0.1
即可。查看生成的数据集:
ls -lrth *.tbl
测试数据生成:注意生成的数据后面每行会多出一个‘|’分隔符,所以需要通过以下脚本简单处理一下,生成对应的csv文件。
for i in `ls *.tbl`; do sed 's/|//′//' //′i > i/tbl/csv;echo{i/tbl/csv}; echo i/tbl/csv;echoi; done;
测试表创建
创建PART
,REGION
,NATION
表:
create table PART(
P_PARTKEY BIGINT,
P_NAME VARCHAR(55),
P_MFGR CHAR(25),
P_BRAND CHAR(10),
P_TYPE VARCHAR(25),
P_SIZE INTEGER,
P_CONTAINER CHAR(10),
P_RETAILPRICE DECIMAL,
P_COMMENT VARCHAR(23)
) DISTRIBUTED BY (P_PARTKEY);
create table region(
R_REGIONKEY BIGINT,
R_NAME CHAR(25),
R_COMMENT VARCHAR(152)
) DISTRIBUTED BY (R_REGIONKEY);
create table NATION(
N_NATIONKEY BIGINT,
N_NAME CHAR(25),
N_REGIONKEY BIGINT NOT NULL,
N_COMMENT VARCHAR(152)
) DISTRIBUTED BY (N_NATIONKEY);
CREATE TABLE SUPPLIER (
S_SUPPKEY BIGINT,
S_NAME CHAR(25),
S_ADDRESS VARCHAR(40),
S_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY
S_PHONE CHAR(15),
S_ACCTBAL DECIMAL,
S_COMMENT VARCHAR(101)
) DISTRIBUTED BY (s_suppkey);
CREATE TABLE CUSTOMER (
C_CUSTKEY BIGINT,
C_NAME VARCHAR(25),
C_ADDRESS VARCHAR(40),
C_NATIONKEY BIGINT NOT NULL, -- references N_NATIONKEY
C_PHONE CHAR(15),
C_ACCTBAL DECIMAL,
C_MKTSEGMENT CHAR(10),
C_COMMENT VARCHAR(117)
) DISTRIBUTED BY (c_custkey);
CREATE TABLE PARTSUPP (
PS_PARTKEY BIGINT NOT NULL, -- references P_PARTKEY
PS_SUPPKEY BIGINT NOT NULL, -- references S_SUPPKEY
PS_AVAILQTY INTEGER,
PS_SUPPLYCOST DECIMAL,
PS_COMMENT VARCHAR(199)
) DISTRIBUTED BY (ps_partkey);
CREATE TABLE ORDERS (
O_ORDERKEY BIGINT,
O_CUSTKEY BIGINT NOT NULL, -- references C_CUSTKEY
O_ORDERSTATUS CHAR(1),
O_TOTALPRICE DECIMAL,
O_ORDERDATE DATE,
O_ORDERPRIORITY CHAR(15),
O_CLERK CHAR(15),
O_SHIPPRIORITY INTEGER,
O_COMMENT VARCHAR(79)
) DISTRIBUTED BY (o_orderkey);
CREATE TABLE LINEITEM (
L_ORDERKEY BIGINT NOT NULL,
L_PARTKEY BIGINT NOT NULL,
L_SUPPKEY BIGINT NOT NULL,
L_LINENUMBER INTEGER,
L_QUANTITY DECIMAL,
L_EXTENDEDPRICE DECIMAL,
L_DISCOUNT DECIMAL,
L_TAX DECIMAL,
L_RETURNFLAG CHAR(1),
L_LINESTATUS CHAR(1),
L_SHIPDATE DATE,
L_COMMITDATE DATE,
L_RECEIPTDATE DATE,
L_SHIPINSTRUCT CHAR(25),
L_SHIPMODE CHAR(10),
L_COMMENT VARCHAR(44)
) DISTRIBUTED BY (l_orderkey);
导入测试数据
直接通过copy
命令将csv
数据导入响应表中:
\copy customer from '/data/dyt/tpch-dbgen-master/customer.csv' with delimiter '|'
\copy lineitem from '/data/dyt/tpch-dbgen-master/lineitem.csv' with delimiter '|'
\copy nation from '/data/dyt/tpch-dbgen-master/nation.csv' with delimiter '|'
\copy orders from '/data/dyt/tpch-dbgen-master/orders.csv' with delimiter '|'
\copy part from '/data/dyt/tpch-dbgen-master/part.csv' with delimiter '|'
\copy partsupp from '/data/dyt/tpch-dbgen-master/partsupp.csv' with delimiter '|'
\copy region from '/data/dyt/tpch-dbgen-master/region.csv' with delimiter '|'
\copy supplier from '/data/dyt/tpch-dbgen-master/supplier.csv' with delimiter '|'
测试脚本准备
TPC-H提供了22个测试SQL,位于/dss/queries
目录下。通过以下脚本生成22条sql语句:
for q in `seq 1 22`
do
DSS_QUERY=dss/templates ./qgen -s {% math_inline %}SF {% endmath_inline %}q > dss/queries/{% math_inline %}q.sql
sed 's/^select/explain select/' dss/queries/{% endmath_inline %}q.sql > dss/queries/$q.explain.sql
done
查看生成的SQL语句
测试执行
可以简单执行若干条SQL查询语句来查看在SF=0.1的情况下的执行时间。
\timing
\i /data/dyt/tpch-dbgen-master/dss/queries/1.sql