Clickhouse in Rails app


First of all - a little history.

What is Clickhouse?

ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).

You can read more at https://clickhouse.yandex/docs/en/.

In this article let’s try to run and setup Clickhouse connection in our Rails application.

Installation

First run clickhouse in docker instance like this:

docker run -d --name mac-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server

Nice, instance started and we an go forward.

Ruby / Rails integration

Installation

First of all let’s install clickouse gem.

I’ve used this one https://github.com/archan937/clickhouse.

gem install 'clickhouse'

Running test queries from irb

irb(main):001:0> require "clickhouse"
=> true
irb(main):002:0> require "logger"
=> true
irb(main):003:0> Clickhouse.logger = Logger.new(STDOUT)
=> #<Logger:0x00007f949607b430 @level=0, @progname=nil, @default_formatter=#<Logger::Formatter:0x00007f949607b3e0 @datetime_format=nil>, @formatter=nil, @logdev=#<Logger::LogDevice:0x00007f949607b390 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDOUT>>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007f949607b318>>>
irb(main):004:0> Clickhouse.establish_connection
=> true

On the next step, let’s try to get list of databases from our Clickhouse instance:

irb(main):005:0> Clickhouse.connection.databases
D, [2018-11-29T16:38:20.985840 #5647] DEBUG -- : 
 SQL (10.0ms)  SHOW DATABASES;
 2 rows in set. Elapsed: 0.04ms. Processed: 2 rows, 238 Bytes (49.88 thousand rows/s, 5.66 MB/s) 
=> ["default", "system"]

Good, now let’s check our instance for tables:

irb(main):007:0> Clickhouse.connection.tables
D, [2018-11-29T16:38:43.263336 #5647] DEBUG -- : 
 SQL (10.2ms)  SHOW TABLES;
 0 rows in set. Elapsed: 0.061ms. Processed: 0 rows, 0 Bytes (0 rows/s, 0 Bytes/s) 
=> []

Now we are ready for our first table.

In my case I had GraphQL based API and I need to add logs for my API requests.

To do it superfast, we can use Clickhouse and small table like this:

irb(main):025:0> Clickhouse.connection.create_table("log_entries") do |t|
irb(main):026:1*   t.fixed_string :id, 16
irb(main):027:1>   t.date         :request_date
irb(main):028:1>   t.date_time    :request_time
irb(main):029:1>   t.string       :request_path
irb(main):030:1>   t.uint32       :user_id
irb(main):031:1>   t.string       :event_type
irb(main):032:1>   t.string       :ip
irb(main):033:1>   t.engine       "MergeTree(request_date, (user_id, event_type), 8192)"
irb(main):034:1> end
D, [2018-11-29T16:57:33.594559 #5647] DEBUG -- : 
 SQL (19.2ms)  CREATE TABLE log_entries (
  id           FixedString(16),
  request_date Date,
  request_time DateTime,
  request_path String,
  user_id      UInt32,
  event_type   String,
  ip           String
)
ENGINE = MergeTree(request_date, (user_id, event_type), 8192);
=> true

Very similar to Rails based migrations DSL, isn’t it?

Let’s use construction ‘DESCRIBE TABLE’ to check out log_entries table:

irb(main):037:0> Clickhouse.connection.query "DESCRIBE TABLE log_entries"
D, [2018-11-29T17:27:21.443048 #5647] DEBUG -- : 
 SQL (37.2ms)  DESCRIBE TABLE log_entries;
 7 rows in set. Elapsed: 2.2ms. Processed: 7 rows, 360 Bytes (3.16 thousand rows/s, 158.8 KB/s) 
=> #<Clickhouse::Connection::Query::ResultSet:0x00007f949531d2e8 @rows=[["id", "FixedString(16)", "", ""], ["request_date", "Date", "", ""], ["request_time", "DateTime", "", ""], ["request_path", "String", "", ""], ["user_id", "UInt32", "", ""], ["event_type", "String", "", ""], ["ip", "String", "", ""]], @names=["name", "type", "default_type", "default_expression"], @types=["String", "String", "String", "String"]>

Let’s perform basic count request:

irb(main):039:0> Clickhouse.connection.count :from => "log_entries"
D, [2018-11-29T18:48:41.207065 #5647] DEBUG -- : 
 SQL (53.6ms)  SELECT COUNT(*)
FROM log_entries;
 1 row in set. Elapsed: 12.6ms. Processed: 1 rows, 2 Bytes (79.12 rows/s, 158 Bytes/s) 
=> 1

Want to select all rows? No problem.

irb(main):054:0> Clickhouse.connection.select_row :select => "COUNT(*), request_date", :from => "log_entries", :group => "request_date"
D, [2018-11-29T18:53:39.356592 #5647] DEBUG -- : 
 SQL (10.6ms)  SELECT COUNT(*), request_date
FROM log_entries
GROUP BY request_date;
 1 row in set. Elapsed: 0.947ms. Processed: 84 rows, 168 Bytes (88.74 thousand rows/s, 173.32 KB/s) 
=> [84, #<Date: 2018-11-29 ((2458452j,0s,0n),+0s,2299161j)>]

Rails implementation

I’m using Command Pattern based commands in my project https://en.wikipedia.org/wiki/Command_pattern.

So this example will be written based this abstraction layer.

Just pass data to our command and Clickhouse performs insert.

And it will be superfast.

class Click::InsertCommand
    def self.run(data)
      Clickhouse.establish_connection
      Clickhouse.connection.insert_rows("log_entries", names: %w(id request_date request_time request_path user_id event_type ip)) do |rows|
        rows << data
      end   
    end
end

Let’s move to our controllers layer, how we can use our command?

Very simple, like this:

  def index
    rental_units = RentalUnit.all
    Click::InsertCommand.run([(SecureRandom.random_number(9e5) + 1e5).to_i + current_user.id, Date.today, DateTime.now.strftime('%Y-%m-%d %H:%M:%S'), request.original_fullpath, current_user.id, "api_request", request.remote_ip])
    paginate json: rental_units
  end

Views.

For example we want to show grouped data from our Clickhouse instance, for example grouped by request_date and request_path.

Super simple:

irb(main):057:0> Clickhouse.connection.select_row :select => "COUNT(*), request_date, request_path", :from => "log_entries", :group => "request_date, request_path"
D, [2018-11-29T18:54:04.120490 #5647] DEBUG -- : 
 SQL (15.9ms)  SELECT COUNT(*), request_date, request_path
FROM log_entries
GROUP BY request_date, request_path;
 4 rows in set. Elapsed: 2.1ms. Processed: 84 rows, 3.71 KB (40.37 thousand rows/s, 1.74 MB/s) 
=> [9, #<Date: 2018-11-29 ((2458452j,0s,0n),+0s,2299161j)>, "/v1/rental_units?per_page=10&page=11"]

Conclusion

Clickhouse is not a “silver bullet” for your application, anyway it’s a solution for a very specific things.

But if you find optimized way to use it in the context of our application, Clickhouse can be very fast and save your money and implementation time.