How Expensive are ClickHouse Operations currently?

Shiv Iyer
Posted on December 24, 2022

Monitoring Expensive ClickHouse Operations

(Troubleshooting ClickHouse – DBA Scripts)



Introduction 

What is the Data Load distribution happening to ClickHouse Server? I ask this question often to ClickHouse infrastructure owners. This matrix is used for trending the load and throughput. This will help me identify the load on ClickHouse Server from peak to off-peak hours. As a Performance Engineer, I spent long hours trending the performance of ClickHouse Infrastructure Operations across latency and throughput for troubleshooting performance more evidence-based than speculative. A detailed understanding of READ – WRITE distribution on ClickHouse Server also helps efficiently compute the usage of available system resources optimally. Both oversized and undersized Infrastructure is a different issue to troubleshoot altogether.

Observing the load happening on ClickHouse Server

  • Total Rows Inserted to all the Tables
  • Total number of INSERT queries
  • Total Data Inserted in Bytes
  • Time spent(WRITE Latency) waiting for WRITE SYSCALL
  • Total number of SELECT queries
  • Total Rows Selected
  • Time spent(READ Latency) waiting for READ SYSCALL
  • Total time spent on query operations – Query Operations Latency Matrix
  • Query operations throughput

Script to measure the load on ClickHouse Server

select *
from (select *
from (select toDateTime(system.metric_log.event_time) as "Event Date Time",
sum(system.metric_log.ProfileEvent_InsertedRows) as "Number of rows INSERTed to all tables.",
sum(system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds) as "Total time spent waiting for write syscall. This include writes to page cache.",
sum(system.metric_log.ProfileEvent_InsertQuery) as "Total number of INSERT queries.",
sum(system.metric_log.ProfileEvent_InsertedBytes) as "Total data INSERTed in Bytes",
sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries",
sum(system.metric_log.ProfileEvent_SelectQuery) as "Total number of SELECT queries",
sum(system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds) as "Total time of SELECT queries.",
sum(system.metric_log.ProfileEvent_SelectedRows) as "Total number of rows SELECTed",
sum(system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds) as "Total time spent waiting for read syscall. This include reads from page cache.",
sum(system.metric_log.ProfileEvent_QueryTimeMicroseconds) as "Total time of all queries."
from system.metric_log
group by
system.metric_log.event_time,system.metric_log.ProfileEvent_InsertedRows,
system.metric_log.ProfileEvent_DiskWriteElapsedMicroseconds,
system.metric_log.ProfileEvent_InsertQuery,system.metric_log.ProfileEvent_InsertedBytes,
system.metric_log.ProfileEvent_QueryTimeMicroseconds,
system.metric_log.ProfileEvent_SelectQuery,
system.metric_log.ProfileEvent_SelectQueryTimeMicroseconds,
system.metric_log.ProfileEvent_SelectedRows,
system.metric_log.ProfileEvent_DiskReadElapsedMicroseconds,
system.metric_log.ProfileEvent_QueryTimeMicroseconds
order by
system.metric_log.ProfileEvent_QueryTimeMicroseconds desc
));

Load on ClickHouse Server Metrics Explained

[table id=25 /]

Leave a Reply

Your email address will not be published. Required fields are marked *