ClickHouse storage configuration


First steps

What is the ClickHouse?

https://clickhouse.tech/

ClickHouse is a fast open-source OLAP database management system.

It is column-oriented and allows to generate analytical reports using SQL queries in real-time.

clickhouse

Requirements

  • VM based on GCP Compute Engine
  • Ubuntu (5.4.0-1029-gcp)
  • ClickHouse installed

Goals

Setup additional storage for ClickHouse data, extend current capacity.

First steps

Let’s add additional hard disk on GCP:

disk

Let’s mount this disk into our OS:

sudo mount -o discard,defaults /dev/sdb /mnt/disks/diskb/
# make sure that clickhouse has write access to this disk

Clickhouse queries

First of all, let’s check current storage configuration for ClickHouse (run clickhouse-client to access CLI):

SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    formatReadableSize(keep_free_space) AS reserved
FROM system.disks

┌─name────┬─path─────────────────┬─free─────┬─total────┬─reserved─┐
│ default │ /var/lib/clickhouse/ │ 6.55 GiB │ 9.52 GiB │ 1.00 KiB │
└─────────┴──────────────────────┴──────────┴──────────┴──────────┘

And a disk policy, liket that:

SELECT policy_name, volume_name, disks
FROM system.storage_policies

┌─policy_name────┬─volume_name─────────┬─disks─────────────────────┐
│ default        │ default             │ ['default']               │
└────────────────┴─────────────────────┴───────────────────────────┘

As a next step, let’s create storage configuration, and put it under:

/etc/clickhouse-server/config.d/storage.xml

Example storage configuration:

<yandex>
  <storage_configuration>
    <disks>
      <default>
         <keep_free_space_bytes>1024</keep_free_space_bytes>
      </default>
      <diskb>
          <path>/mnt/disks/diskb/</path>
      </diskb>
    </disks>
    <policies>
      <diskb_only>
        <volumes>
          <diskb_volume>
            <disk>diskb</disk>
          </diskb_volume>
        </volumes>
      </diskb_only>
    </policies>
  </storage_configuration>
</yandex>

Let’s restart our ClickHouse instance and check status:

$ sudo systemctl restart clickhouse-server
$ sudo systemctl status clickhouse-server
● clickhouse-server.service - ClickHouse Server (analytic DBMS for big data)
   Loaded: loaded (/etc/systemd/system/clickhouse-server.service; enabled; vendor preset: enabled)
   Active: activating (auto-restart) (Result: exit-code) since Mon 2020-07-11 13:01:02 UTC; 3s ago
  Process: 3563 ExecStart=/usr/bin/clickhouse-server --config=/etc/clickhouse-server/config.xml --pid-file=/run/clickhouse-server/clickhouse-server.pid (code=exited, status=70)
 Main PID: 3563 (code=exited, status=70)

One more request from CLI, let’s see how new data is used:

SELECT
    name,
    path,
    formatReadableSize(free_space) AS free,
    formatReadableSize(total_space) AS total,
    formatReadableSize(keep_free_space) AS reserved
FROM system.disks

Query id: af2b60e8-7ba1-4682-922c-0c805cc1493d

┌─name────┬─path─────────────────┬─free───────┬─total──────┬─reserved─┐
│ default │ /var/lib/clickhouse/ │ 6.55 GiB   │ 9.52 GiB   │ 1.00 KiB │
│ diskb   │ /mnt/disks/diskb/    │ 187.78 GiB │ 195.86 GiB │ 0.00 B   │
└─────────┴──────────────────────┴────────────┴────────────┴──────────┘

2 rows in set. Elapsed: 0.003 sec.

And how was new policies applied:

SELECT
    policy_name,
    volume_name,
    disks
FROM system.storage_policies

Query id: 35c921e5-afdd-4767-9dd9-2d564ac3c4fe

┌─policy_name─┬─volume_name──┬─disks───────┐
│ default     │ default      │ ['default'] │
│ diskb_only  │ diskb_volume │ ['diskb']   │
└─────────────┴──────────────┴─────────────┘

2 rows in set. Elapsed: 0.002 sec.

Success! New diskb_only policy was added with diskb disk!

Using new storage

Just one simple SETTINGS option, and we’re ready to go!

CREATE TABLE logs (id UInt64) Engine=MergeTree 
ORDER BY id SETTINGS storage_policy = 'diskb_only';

Stay tuned!