0%

ClickHouse部署

本文记录了 ClickHouse 环境的搭建过程。

内核参数优化

  1. /etc/sysctl.conf 中添加如下内容:

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    fs.file-max = 10000000
    fs.nr_open = 10000000
    net.core.default_qdisc = fq
    net.ipv4.tcp_mem = 786432 1697152 1945728
    net.ipv4.tcp_rmem = 4096 4096 16777216
    net.ipv4.tcp_wmem = 4096 4096 16777216
    net.ipv4.tcp_congestion_control = bbr
    net.ipv4.tcp_slow_start_after_idle = 0
    net.ipv4.tcp_fin_timeout = 30
    net.ipv4.tcp_tw_reuse = 1
    net.ipv4.tcp_tw_recycle = 1
    net.netfilter.nf_conntrack_max=655350

    vm.swappiness=10
    1. 加载新的配置参数

      1
      sysctl -p
    2. 验证 BBR 算法是否生效

      1
      lsmod | grep bbr

      结果会显示 tcp_bbr 加一串数字,说明 BBR 算法已启动。

    3. 验证其他修改值是否生效

      1
      2
      3
      4
      cat /proc/sys/net/netfilter/nf_conntrack_max
      cat /proc/sys/net/ipv4/tcp_fin_timeout
      cat /proc/sys/net/ipv4/tcp_tw_reuse
      cat /proc/sys/net/ipv4/tcp_tw_recycle

      如果输出为修改的值说明已生效

  2. /etc/security/limits.conf 中添加如下内容:

    1
    2
    3
    4
    5
    6
    root soft nofile 1000000
    root hard nofile 1000000
    * soft nofile 1000000
    * hard nofile 1000000
    * soft nproc 131072
    * hard nproc 131072

    大部分的 Linux 发行版这样修改之后重启即可生效。

    1. 验证是否生效:

      1
      ulimit -n

      如果输出为 1000000 则表明已经修改完成。有时发现重启也无效,检查一下 /etc/profile 有一个 ulimit -SHn 65536 ,把它注释掉就行了。

    2. 临时修改

      1
      ulimit -n 1000000

安装

1
2
3
4
5
6
7
8
9
10
11
sudo apt-get install apt-transport-https ca-certificates dirmngr
sudo apt-key adv --keyserver hkp://keyserver.ubuntu.com:80 --recv E0C56BD4

echo "deb https://repo.clickhouse.tech/deb/stable/ main/" | sudo tee \
/etc/apt/sources.list.d/clickhouse.list
sudo apt-get update

sudo apt-get install -y clickhouse-server clickhouse-client

sudo service clickhouse-server start
clickhouse-client

配置

修改 /etc/clickhouse-server/config.xml

1
vim /etc/clickhouse-server/config.xml

才能让 ClickHouse 被除本机以外的服务器访问,修改::为:

1
<listen_host>0.0.0.0</listen_host>

数据文件和日志文件的默认配置(无需修改):

1
2
数据文件路径:<path>/var/lib/clickhouse/</path>
日志文件路径:<log>/var/log/clickhouse-server/clickhouse-server.log</log>

修改 /etc/clickhouse-server/users.xml ,添加以下配置:

1
2
3
4
5
6
7
8
9
10
<!-- 设置后台线程池的大小 -->
<background_pool_size>24</background_pool_size>
<!-- 设置分布式发送线程池的大小 -->
<background_distributed_schedule_pool_size>24</background_distributed_schedule_pool_size>
<!-- 设置单次查询占用内存的最大值 -->
<max_memory_usage>48000000000</max_memory_usage>
<!-- 设置Group By使用的最大内存,超过该阈值后会刷新到磁盘进行 -->
<max_bytes_before_external_group_by>24000000000</max_bytes_before_external_group_by>
<!-- 设置Order By使用的最大内存,超过该阈值后会刷新到磁盘进行 -->
<max_bytes_before_external_sort>24000000000</max_bytes_before_external_sort>

修改 /etc/clickhouse-server/config.xml ,修改以下配置:

1
2
3
4
5
6
<!-- 设置并发处理的最大请求数 -->
<max_concurrent_queries>200</max_concurrent_queries>
<!-- 设置可以不受任何限制地删除所有表 -->
<max_table_size_to_drop>0</max_table_size_to_drop>
<!-- 设置时区 -->
<timezone>Asia/Shanghai</timezone>

防火墙

1
2
3
4
5
iptables -I INPUT -p tcp -m multiport --dport 8123,9000,9004,9005 -j DROP
iptables -I INPUT -s 127.0.0.1 -p tcp -j ACCEPT
iptables -I INPUT -s 10.32.18.70 -p tcp -j ACCEPT
iptables -I INPUT -s 10.86.52.68 -p tcp -j ACCEPT
iptables-save > /root/ipt_save.def

端口

  • 8123:处理客户端 HTTP 请求
  • 9000:处理客户端 TCP 请求
  • 9004:通过 MySQL 协议与客户端通信的端口
  • 9005:通过 PostgreSQL 协议与客户端通信的端口

Zookeeper集群

ClickHouse 副本和分片都依赖 Zookeeper,Zookeeper 依赖 Java环境。

  1. 下载 zookeeper

    1
    2
    cd /opt
    wget https://downloads.apache.org/zookeeper/zookeeper-3.5.9/apache-zookeeper-3.5.9-bin.tar.gz
  2. 解压并重命名

    1
    2
    tar -zxvf apache-zookeeper-3.5.9-bin.tar.gz -C /opt
    mv apache-zookeeper-3.5.9-bin zookeeper
  3. 配置服务器编号

    1
    2
    3
    4
    cd /var/data && mkdir zookeeper
    # 创建文件 myid
    cd zookeeper && vim myid
    # 三台服务器分别输入 1,2,3
  4. 配置 zoo.cfg 文件

    1
    2
    3
    4
    5
    6
    7
    8
    9
    10
    11
    12
    13
    14
    # 进入配置目录
    cd /opt/zookeeper/conf
    # 重命名conf这个目录下的zoo_sample.cfg为zoo.cfg
    mv zoo_sample.cfg zoo.cfg
    # 修改 zoo.cfg
    vim zoo.cfg

    # 修改 dataDir
    dataDir=/var/data/zookeeper
    # 添加如下内容
    #######################cluster##########################
    server.1=clickhouse1:2888:3888
    server.2=clickhouse2:2888:3888
    server.3=clickhouse3:2888:3888
  5. 这里通过 scp 复制配置文件,前提是需要配置免密登陆

    1
    scp -P 45222 -r root@clickhouse1:/opt/zookeeper/conf/zoo.cfg /opt/zookeeper/conf/
  6. 启动

    1
    cd /opt/zookeeper && bin/zkServer.sh start
  7. 测试

    1
    cd /opt/zookeeper && bin/zkServer.sh status
  8. 配置开机自启动

    1
    2
    3
    # 编辑 /etc/rc.local ,添加
    source /etc/profile
    /opt/zookeeper/bin/zkServer.sh start

ClickHouse集群

在 config.xml 加上以下配置:

1
2
3
4
5
6
7
8
9
10
<zookeeper incl="zookeeper-servers" optional="true" />
<include_from>/etc/clickhouse-server/config.d/metrika-shard.xml</include_from>

<distributed_ddl>
<!-- Path in ZooKeeper to queue with DDL queries -->
<path>/clickhouse/task_queue/ddl</path>
<cleanup_delay_period>60</cleanup_delay_period>
<task_max_lifetime>86400</task_max_lifetime>
<max_tasks_in_queue>1000</max_tasks_in_queue>
</distributed_ddl>

在 /etc/clickhouse-server/config.d 目录下创建 metrika-shard.xml 文件:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
<?xml version="1.0"?>
<yandex>
<remote_servers>
<safety_cluster>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse1</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse2</host>
<port>9000</port>
</replica>
</shard>
<shard>
<internal_replication>true</internal_replication>
<replica>
<host>clickhouse3</host>
<port>9000</port>
</replica>
</shard>
</safety_cluster>
</remote_servers>
<zookeeper-servers>
<node index="1">
<host>hadoop1</host>
<port>2181</port>
</node>
<node index="2">
<host>hadoop2</host>
<port>2181</port>
</node>
<node index="3">
<host>hadoop3</host>
<port>2181</port>
</node>
</zookeeper-servers>
<macros>
<cluster>safety_cluster</cluster>
<shard>01</shard>
<replica>01_01</replica>
</macros>
</yandex>

文件复制:

1
2
scp -P 45222 -r root@clickhouse1:/etc/clickhouse-server/config.d/metrika-shard.xml /etc/clickhouse-server/config.d/
scp -P 45222 -r root@clickhouse1:/etc/clickhouse-server/config.xml /etc/clickhouse-server/

重启:

1
service clickhouse-server restart

测试集群

分布式DDL创建数据库

1
CREATE DATABASE if not exists dataset on cluster safety_cluster;

分布式DDL创建本地表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
CREATE TABLE dataset.hits_local on cluster safety_cluster
(
`WatchID` UInt64,
`JavaEnable` UInt8,
`Title` String,
`GoodEvent` Int16,
`EventTime` DateTime,
`EventDate` Date,
`CounterID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RegionID` UInt32,
`UserID` UInt64,
`CounterClass` Int8,
`OS` UInt8,
`UserAgent` UInt8,
`URL` String,
`Referer` String,
`URLDomain` String,
`RefererDomain` String,
`Refresh` UInt8,
`IsRobot` UInt8,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`FlashMinor2` String,
`NetMajor` UInt8,
`NetMinor` UInt8,
`UserAgentMajor` UInt16,
`UserAgentMinor` FixedString(2),
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`MobilePhone` UInt8,
`MobilePhoneModel` String,
`Params` String,
`IPNetworkID` UInt32,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`IsArtifical` UInt8,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`ClientTimeZone` Int16,
`ClientEventTime` DateTime,
`SilverlightVersion1` UInt8,
`SilverlightVersion2` UInt8,
`SilverlightVersion3` UInt32,
`SilverlightVersion4` UInt16,
`PageCharset` String,
`CodeVersion` UInt32,
`IsLink` UInt8,
`IsDownload` UInt8,
`IsNotBounce` UInt8,
`FUniqID` UInt64,
`HID` UInt32,
`IsOldCounter` UInt8,
`IsEvent` UInt8,
`IsParameter` UInt8,
`DontCountHits` UInt8,
`WithHash` UInt8,
`HitColor` FixedString(1),
`UTCEventTime` DateTime,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`WindowName` Int32,
`OpenerName` Int32,
`HistoryLength` Int16,
`BrowserLanguage` FixedString(2),
`BrowserCountry` FixedString(2),
`SocialNetwork` String,
`SocialAction` String,
`HTTPError` UInt16,
`SendTiming` Int32,
`DNSTiming` Int32,
`ConnectTiming` Int32,
`ResponseStartTiming` Int32,
`ResponseEndTiming` Int32,
`FetchTiming` Int32,
`RedirectTiming` Int32,
`DOMInteractiveTiming` Int32,
`DOMContentLoadedTiming` Int32,
`DOMCompleteTiming` Int32,
`LoadEventStartTiming` Int32,
`LoadEventEndTiming` Int32,
`NSToDOMContentLoadedTiming` Int32,
`FirstPaintTiming` Int32,
`RedirectCount` Int8,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`ParamPrice` Int64,
`ParamOrderID` String,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`GoalsReached` Array(UInt32),
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`RefererHash` UInt64,
`URLHash` UInt64,
`CLID` UInt32,
`YCLID` UInt64,
`ShareService` String,
`ShareURL` String,
`ShareTitle` String,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`IslandID` FixedString(16),
`RequestNum` UInt32,
`RequestTry` UInt8
)
ENGINE = ReplicatedMergeTree('/clickhouse/tables/{shard}/hits','{replica}')
PARTITION BY toYYYYMM(EventDate)
ORDER BY (CounterID, EventDate, intHash32(UserID))
SAMPLE BY intHash32(UserID);

分布式DDL创建分布式表

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
CREATE TABLE dataset.hits_all on cluster safety_cluster
(
`WatchID` UInt64,
`JavaEnable` UInt8,
`Title` String,
`GoodEvent` Int16,
`EventTime` DateTime,
`EventDate` Date,
`CounterID` UInt32,
`ClientIP` UInt32,
`ClientIP6` FixedString(16),
`RegionID` UInt32,
`UserID` UInt64,
`CounterClass` Int8,
`OS` UInt8,
`UserAgent` UInt8,
`URL` String,
`Referer` String,
`URLDomain` String,
`RefererDomain` String,
`Refresh` UInt8,
`IsRobot` UInt8,
`RefererCategories` Array(UInt16),
`URLCategories` Array(UInt16),
`URLRegions` Array(UInt32),
`RefererRegions` Array(UInt32),
`ResolutionWidth` UInt16,
`ResolutionHeight` UInt16,
`ResolutionDepth` UInt8,
`FlashMajor` UInt8,
`FlashMinor` UInt8,
`FlashMinor2` String,
`NetMajor` UInt8,
`NetMinor` UInt8,
`UserAgentMajor` UInt16,
`UserAgentMinor` FixedString(2),
`CookieEnable` UInt8,
`JavascriptEnable` UInt8,
`IsMobile` UInt8,
`MobilePhone` UInt8,
`MobilePhoneModel` String,
`Params` String,
`IPNetworkID` UInt32,
`TraficSourceID` Int8,
`SearchEngineID` UInt16,
`SearchPhrase` String,
`AdvEngineID` UInt8,
`IsArtifical` UInt8,
`WindowClientWidth` UInt16,
`WindowClientHeight` UInt16,
`ClientTimeZone` Int16,
`ClientEventTime` DateTime,
`SilverlightVersion1` UInt8,
`SilverlightVersion2` UInt8,
`SilverlightVersion3` UInt32,
`SilverlightVersion4` UInt16,
`PageCharset` String,
`CodeVersion` UInt32,
`IsLink` UInt8,
`IsDownload` UInt8,
`IsNotBounce` UInt8,
`FUniqID` UInt64,
`HID` UInt32,
`IsOldCounter` UInt8,
`IsEvent` UInt8,
`IsParameter` UInt8,
`DontCountHits` UInt8,
`WithHash` UInt8,
`HitColor` FixedString(1),
`UTCEventTime` DateTime,
`Age` UInt8,
`Sex` UInt8,
`Income` UInt8,
`Interests` UInt16,
`Robotness` UInt8,
`GeneralInterests` Array(UInt16),
`RemoteIP` UInt32,
`RemoteIP6` FixedString(16),
`WindowName` Int32,
`OpenerName` Int32,
`HistoryLength` Int16,
`BrowserLanguage` FixedString(2),
`BrowserCountry` FixedString(2),
`SocialNetwork` String,
`SocialAction` String,
`HTTPError` UInt16,
`SendTiming` Int32,
`DNSTiming` Int32,
`ConnectTiming` Int32,
`ResponseStartTiming` Int32,
`ResponseEndTiming` Int32,
`FetchTiming` Int32,
`RedirectTiming` Int32,
`DOMInteractiveTiming` Int32,
`DOMContentLoadedTiming` Int32,
`DOMCompleteTiming` Int32,
`LoadEventStartTiming` Int32,
`LoadEventEndTiming` Int32,
`NSToDOMContentLoadedTiming` Int32,
`FirstPaintTiming` Int32,
`RedirectCount` Int8,
`SocialSourceNetworkID` UInt8,
`SocialSourcePage` String,
`ParamPrice` Int64,
`ParamOrderID` String,
`ParamCurrency` FixedString(3),
`ParamCurrencyID` UInt16,
`GoalsReached` Array(UInt32),
`OpenstatServiceName` String,
`OpenstatCampaignID` String,
`OpenstatAdID` String,
`OpenstatSourceID` String,
`UTMSource` String,
`UTMMedium` String,
`UTMCampaign` String,
`UTMContent` String,
`UTMTerm` String,
`FromTag` String,
`HasGCLID` UInt8,
`RefererHash` UInt64,
`URLHash` UInt64,
`CLID` UInt32,
`YCLID` UInt64,
`ShareService` String,
`ShareURL` String,
`ShareTitle` String,
`ParsedParams` Nested(
Key1 String,
Key2 String,
Key3 String,
Key4 String,
Key5 String,
ValueDouble Float64),
`IslandID` FixedString(16),
`RequestNum` UInt32,
`RequestTry` UInt8
)
ENGINE = Distributed(safety_cluster, dataset, hits_local, rand());

下载并导入数据

1
2
3
4
# 下载数据
curl https://datasets.clickhouse.tech/hits/tsv/hits_v1.tsv.xz | unxz --threads=`nproc` > hits_v1.tsv
# 导入数据
clickhouse-client --query "INSERT INTO dataset.hits_all FORMAT TSV" --max_insert_block_size=100000 < hits_v1.tsv
坚持原创技术分享,您的支持将鼓励我继续创作!