First of all - a little history.
What is Clickhouse?
ClickHouse is a column-oriented database management system (DBMS) for online analytical processing of queries (OLAP).
You can read more at https://clickhouse.yandex/docs/en/.
In this article let’s try to run and setup Clickhouse connection in our Rails application.
Installation
First run clickhouse in docker instance like this:
docker run -d --name mac-clickhouse-server --ulimit nofile=262144:262144 yandex/clickhouse-server
Nice, instance started and we an go forward.
Ruby / Rails integration
Installation
First of all let’s install clickouse gem.
I’ve used this one https://github.com/archan937/clickhouse.
gem install 'clickhouse'
Running test queries from irb
irb(main):001:0> require "clickhouse"
=> true
irb(main):002:0> require "logger"
=> true
irb(main):003:0> Clickhouse.logger = Logger.new(STDOUT)
=> #<Logger:0x00007f949607b430 @level=0, @progname=nil, @default_formatter=#<Logger::Formatter:0x00007f949607b3e0 @datetime_format=nil>, @formatter=nil, @logdev=#<Logger::LogDevice:0x00007f949607b390 @shift_period_suffix=nil, @shift_size=nil, @shift_age=nil, @filename=nil, @dev=#<IO:<STDOUT>>, @mon_owner=nil, @mon_count=0, @mon_mutex=#<Thread::Mutex:0x00007f949607b318>>>
irb(main):004:0> Clickhouse.establish_connection
=> true
On the next step, let’s try to get list of databases from our Clickhouse instance:
irb(main):005:0> Clickhouse.connection.databases
D, [2018-11-29T16:38:20.985840 #5647] DEBUG -- :
SQL (10.0ms) SHOW DATABASES;
2 rows in set. Elapsed: 0.04ms. Processed: 2 rows, 238 Bytes (49.88 thousand rows/s, 5.66 MB/s)
=> ["default", "system"]
Good, now let’s check our instance for tables:
irb(main):007:0> Clickhouse.connection.tables
D, [2018-11-29T16:38:43.263336 #5647] DEBUG -- :
SQL (10.2ms) SHOW TABLES;
0 rows in set. Elapsed: 0.061ms. Processed: 0 rows, 0 Bytes (0 rows/s, 0 Bytes/s)
=> []
Now we are ready for our first table.
In my case I had GraphQL based API and I need to add logs for my API requests.
To do it superfast, we can use Clickhouse and small table like this:
irb(main):025:0> Clickhouse.connection.create_table("log_entries") do |t|
irb(main):026:1* t.fixed_string :id, 16
irb(main):027:1> t.date :request_date
irb(main):028:1> t.date_time :request_time
irb(main):029:1> t.string :request_path
irb(main):030:1> t.uint32 :user_id
irb(main):031:1> t.string :event_type
irb(main):032:1> t.string :ip
irb(main):033:1> t.engine "MergeTree(request_date, (user_id, event_type), 8192)"
irb(main):034:1> end
D, [2018-11-29T16:57:33.594559 #5647] DEBUG -- :
SQL (19.2ms) CREATE TABLE log_entries (
id FixedString(16),
request_date Date,
request_time DateTime,
request_path String,
user_id UInt32,
event_type String,
ip String
)
ENGINE = MergeTree(request_date, (user_id, event_type), 8192);
=> true
Very similar to Rails based migrations DSL, isn’t it?
Let’s use construction ‘DESCRIBE TABLE’ to check out log_entries table:
irb(main):037:0> Clickhouse.connection.query "DESCRIBE TABLE log_entries"
D, [2018-11-29T17:27:21.443048 #5647] DEBUG -- :
SQL (37.2ms) DESCRIBE TABLE log_entries;
7 rows in set. Elapsed: 2.2ms. Processed: 7 rows, 360 Bytes (3.16 thousand rows/s, 158.8 KB/s)
=> #<Clickhouse::Connection::Query::ResultSet:0x00007f949531d2e8 @rows=[["id", "FixedString(16)", "", ""], ["request_date", "Date", "", ""], ["request_time", "DateTime", "", ""], ["request_path", "String", "", ""], ["user_id", "UInt32", "", ""], ["event_type", "String", "", ""], ["ip", "String", "", ""]], @names=["name", "type", "default_type", "default_expression"], @types=["String", "String", "String", "String"]>
Let’s perform basic count request:
irb(main):039:0> Clickhouse.connection.count :from => "log_entries"
D, [2018-11-29T18:48:41.207065 #5647] DEBUG -- :
SQL (53.6ms) SELECT COUNT(*)
FROM log_entries;
1 row in set. Elapsed: 12.6ms. Processed: 1 rows, 2 Bytes (79.12 rows/s, 158 Bytes/s)
=> 1
Want to select all rows? No problem.
irb(main):054:0> Clickhouse.connection.select_row :select => "COUNT(*), request_date", :from => "log_entries", :group => "request_date"
D, [2018-11-29T18:53:39.356592 #5647] DEBUG -- :
SQL (10.6ms) SELECT COUNT(*), request_date
FROM log_entries
GROUP BY request_date;
1 row in set. Elapsed: 0.947ms. Processed: 84 rows, 168 Bytes (88.74 thousand rows/s, 173.32 KB/s)
=> [84, #<Date: 2018-11-29 ((2458452j,0s,0n),+0s,2299161j)>]
Rails implementation
I’m using Command Pattern based commands in my project https://en.wikipedia.org/wiki/Command_pattern.
So this example will be written based this abstraction layer.
Just pass data to our command and Clickhouse performs insert.
And it will be superfast.
class Click::InsertCommand
def self.run(data)
Clickhouse.establish_connection
Clickhouse.connection.insert_rows("log_entries", names: %w(id request_date request_time request_path user_id event_type ip)) do |rows|
rows << data
end
end
end
Let’s move to our controllers layer, how we can use our command?
Very simple, like this:
def index
rental_units = RentalUnit.all
Click::InsertCommand.run([(SecureRandom.random_number(9e5) + 1e5).to_i + current_user.id, Date.today, DateTime.now.strftime('%Y-%m-%d %H:%M:%S'), request.original_fullpath, current_user.id, "api_request", request.remote_ip])
paginate json: rental_units
end
Views.
For example we want to show grouped data from our Clickhouse instance, for example grouped by request_date and request_path.
Super simple:
irb(main):057:0> Clickhouse.connection.select_row :select => "COUNT(*), request_date, request_path", :from => "log_entries", :group => "request_date, request_path"
D, [2018-11-29T18:54:04.120490 #5647] DEBUG -- :
SQL (15.9ms) SELECT COUNT(*), request_date, request_path
FROM log_entries
GROUP BY request_date, request_path;
4 rows in set. Elapsed: 2.1ms. Processed: 84 rows, 3.71 KB (40.37 thousand rows/s, 1.74 MB/s)
=> [9, #<Date: 2018-11-29 ((2458452j,0s,0n),+0s,2299161j)>, "/v1/rental_units?per_page=10&page=11"]
Conclusion
Clickhouse is not a “silver bullet” for your application, anyway it’s a solution for a very specific things.
But if you find optimized way to use it in the context of our application, Clickhouse can be very fast and save your money and implementation time.