基准评测工具

安装测试工具

  • 三个重要考虑方面:数据如何生成,测量指标是什么,负载怎么测

  • 首先下载TPC-H,修改makefile文件,在103行左右:

    • # Current values for WORKLOAD are:  TPCH
      DATABASE= TDAT
      MACHINE = LINUX
      WORKLOAD = TPCH
      
  • 修改完成后直接运行make cleanmake进行编译

  • 测试数据生成:我们需要自动创建测试数据。编译之后会生成dbgen执行文件,用TPC-H即可方便地生成自定义大小的测试数据。在这里以0.1GB数据为例,采用命令./dbgen –s 0.1即可。

  • 查看生成的数据集:ls -lrth *.tbl

    • tpch1
  • 测试数据生成:注意生成的数据后面每行会多出一个‘|’分隔符,所以需要通过以下脚本简单处理一下,生成对应的csv文件。

    • for i in `ls *.tbl`; do sed 's/|//′//' //′i > i/tbl/csv;echo{i/tbl/csv}; echo i/tbl/csv;echoi; done;
      

测试表创建

创建PARTREGIONNATION表:

create table PART(
    P_PARTKEY    BIGINT,
    P_NAME        VARCHAR(55),
    P_MFGR        CHAR(25),
    P_BRAND        CHAR(10),
    P_TYPE        VARCHAR(25),
    P_SIZE        INTEGER,
    P_CONTAINER    CHAR(10),
    P_RETAILPRICE    DECIMAL,
    P_COMMENT    VARCHAR(23)
) DISTRIBUTED BY (P_PARTKEY);

create table region(
    R_REGIONKEY        BIGINT,
    R_NAME        CHAR(25),
    R_COMMENT    VARCHAR(152)
) DISTRIBUTED BY (R_REGIONKEY);

create table NATION(
    N_NATIONKEY        BIGINT,
    N_NAME        CHAR(25),
    N_REGIONKEY        BIGINT NOT NULL,
    N_COMMENT    VARCHAR(152)
) DISTRIBUTED BY (N_NATIONKEY);

CREATE TABLE SUPPLIER (
                S_SUPPKEY               BIGINT,
                S_NAME                  CHAR(25),
                S_ADDRESS               VARCHAR(40),
                S_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                S_PHONE                 CHAR(15),
                S_ACCTBAL               DECIMAL,
                S_COMMENT               VARCHAR(101)
        ) DISTRIBUTED BY (s_suppkey);

CREATE TABLE CUSTOMER (
                C_CUSTKEY               BIGINT,
                C_NAME                  VARCHAR(25),
                C_ADDRESS               VARCHAR(40),
                C_NATIONKEY             BIGINT NOT NULL, -- references N_NATIONKEY
                C_PHONE                 CHAR(15),
                C_ACCTBAL               DECIMAL,
                C_MKTSEGMENT    CHAR(10),
                C_COMMENT               VARCHAR(117)
        ) DISTRIBUTED BY (c_custkey);

CREATE TABLE PARTSUPP (
                PS_PARTKEY              BIGINT NOT NULL, -- references P_PARTKEY
                PS_SUPPKEY              BIGINT NOT NULL, -- references S_SUPPKEY
                PS_AVAILQTY             INTEGER,
                PS_SUPPLYCOST   DECIMAL,
                PS_COMMENT              VARCHAR(199)
        ) DISTRIBUTED BY (ps_partkey);

CREATE TABLE ORDERS (
                O_ORDERKEY              BIGINT,
                O_CUSTKEY               BIGINT NOT NULL, -- references C_CUSTKEY
                O_ORDERSTATUS   CHAR(1),
                O_TOTALPRICE    DECIMAL,
                O_ORDERDATE             DATE,
                O_ORDERPRIORITY CHAR(15),
                O_CLERK                 CHAR(15),
                O_SHIPPRIORITY  INTEGER,
                O_COMMENT               VARCHAR(79)
        ) DISTRIBUTED BY (o_orderkey);

CREATE TABLE LINEITEM (
                L_ORDERKEY              BIGINT NOT NULL, 
                L_PARTKEY               BIGINT NOT NULL, 
      L_SUPPKEY               BIGINT NOT NULL, 
                L_LINENUMBER    INTEGER,
                L_QUANTITY              DECIMAL,
                L_EXTENDEDPRICE DECIMAL,
                L_DISCOUNT              DECIMAL,
                L_TAX                   DECIMAL,
                L_RETURNFLAG    CHAR(1),
                L_LINESTATUS    CHAR(1),
                L_SHIPDATE              DATE,
                L_COMMITDATE    DATE,
                L_RECEIPTDATE   DATE,
                L_SHIPINSTRUCT  CHAR(25),
                L_SHIPMODE              CHAR(10),
                L_COMMENT               VARCHAR(44)
        ) DISTRIBUTED BY (l_orderkey);

导入测试数据

直接通过copy命令将csv数据导入响应表中:

\copy customer from '/data/dyt/tpch-dbgen-master/customer.csv' with delimiter '|'
\copy lineitem from '/data/dyt/tpch-dbgen-master/lineitem.csv' with delimiter '|'
\copy nation from '/data/dyt/tpch-dbgen-master/nation.csv' with delimiter '|'
\copy orders from '/data/dyt/tpch-dbgen-master/orders.csv' with delimiter '|'
\copy part from '/data/dyt/tpch-dbgen-master/part.csv' with delimiter '|'
\copy partsupp from '/data/dyt/tpch-dbgen-master/partsupp.csv' with delimiter '|'
\copy region from '/data/dyt/tpch-dbgen-master/region.csv' with delimiter '|'
\copy supplier from '/data/dyt/tpch-dbgen-master/supplier.csv' with delimiter '|'

测试脚本准备

TPC-H提供了22个测试SQL,位于/dss/queries目录下。通过以下脚本生成22条sql语句:

for q in `seq 1 22`
do
    DSS_QUERY=dss/templates ./qgen -s {% math_inline %}SF {% endmath_inline %}q > dss/queries/{% math_inline %}q.sql
    sed 's/^select/explain select/' dss/queries/{% endmath_inline %}q.sql > dss/queries/$q.explain.sql
done

查看生成的SQL语句

tpch2

测试执行

可以简单执行若干条SQL查询语句来查看在SF=0.1的情况下的执行时间。

\timing
\i /data/dyt/tpch-dbgen-master/dss/queries/1.sql

tpch3

tpch4

results matching ""

    No results matching ""