Executing statements in Clickhouse¶

Warning

Methods below read all the rows returned from DB to Spark driver memory, and then convert them to DataFrame.

Do NOT use them to read large amounts of data. Use DBReader or Clickhouse.sql instead.

How to¶

There are 2 ways to execute some statement in Clickhouse

Use `Clickhouse.fetch`¶

Use this method to perform some SELECT query which returns small number or rows, like reading Clickhouse config, or reading data from some reference table. Method returns Spark DataFrame.

Method accepts Clickhouse.FetchOptions.

Warning

Please take into account Clickhouse types.

Syntax support in `Clickhouse.fetch`¶

This method supports any query syntax supported by Clickhouse, like:

✅︎ SELECT ... FROM ...
✅︎ WITH alias AS (...) SELECT ...
✅︎ SELECT func(arg1, arg2) - call function
✅︎ SHOW ...
❌ SET ...; SELECT ...; - multiple statements not supported

Examples for `Clickhouse.fetch`¶

from onetl.connection import Clickhouse

clickhouse = Clickhouse(...)

df = clickhouse.fetch(
    "SELECT value FROM some.reference_table WHERE key = 'some_constant'",
    options=Clickhouse.FetchOptions(queryTimeout=10),
)
clickhouse.close()
value = df.collect()[0][0]  # get value from first row and first column

Use `Clickhouse.execute`¶

Use this method to execute DDL and DML operations. Each method call runs operation in a separated transaction, and then commits it.

Method accepts Clickhouse.ExecuteOptions.

Syntax support in `Clickhouse.execute`¶

This method supports any query syntax supported by Clickhouse, like:

✅︎ CREATE TABLE ..., CREATE VIEW ..., and so on
✅︎ ALTER ...
✅︎ INSERT INTO ... SELECT ..., UPDATE ..., DELETE ..., and so on
✅︎ DROP TABLE ..., DROP VIEW ..., TRUNCATE TABLE, and so on
✅︎ other statements not mentioned here
❌ SET ...; SELECT ...; - multiple statements not supported

Examples for `Clickhouse.execute`¶

from onetl.connection import Clickhouse

clickhouse = Clickhouse(...)

clickhouse.execute("DROP TABLE schema.table")
clickhouse.execute(
    """
    CREATE TABLE schema.table (
        id UInt8,
        key String,
        value Float32
    )
    ENGINE = MergeTree()
    ORDER BY id
    """,
    options=Clickhouse.ExecuteOptions(queryTimeout=10),
)

Notes¶

These methods read all the rows returned from DB to Spark driver memory, and then convert them to DataFrame.

So it should NOT be used to read large amounts of data. Use DBReader or Clickhouse.sql instead.

Options¶

`ClickhouseFetchOptions` ¶

Bases: JDBCFetchOptions

Options related to fetching data from databases via JDBC.

.. versionadded:: 0.11.0 Replace Clickhouse.JDBCOptions → Clickhouse.FetchOptions

Examples:

.. note ::

You can pass any value supported by underlying JDBC driver class,
even if it is not mentioned in this documentation.

.. code:: python

from onetl.connection import Clickhouse

options = Clickhouse.FetchOptions(
    queryTimeout=60_000,
    fetchsize=100_000,
    customSparkOption="value",
)

`fetchsize = None` `class-attribute` `instance-attribute` ¶

How many rows to fetch per round trip.

Tuning this option can influence performance of reading.

.. warning:: Default value depends on driver. For example, Oracle has default fetchsize=10.

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute` ¶

The number of seconds the driver will wait for a statement to execute. Zero means there is no limit.

This option depends on driver implementation, some drivers can check the timeout of each query instead of an entire JDBC batch.

`parse(options)` `classmethod` ¶

If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged. If a Dict object was passed it will be converted to ReadOptions.

Otherwise, an exception will be raised

`ClickhouseExecuteOptions` ¶

Bases: JDBCExecuteOptions

Options related to executing statements in databases via JDBC.

.. versionadded:: 0.11.0 Replace Clickhouse.JDBCOptions → Clickhouse.ExecuteOptions

Examples:

.. note ::

You can pass any value supported by underlying JDBC driver class,
even if it is not mentioned in this documentation.

.. code:: python

from onetl.connection import Clickhouse

options = Clickhouse.ExecuteOptions(
    queryTimeout=60_000,
    customSparkOption="value",
)

`fetchsize = None` `class-attribute` `instance-attribute` ¶

How many rows to fetch per round trip.

Tuning this option can influence performance of reading.

.. warning:: Default value depends on driver. For example, Oracle has default fetchsize=10.

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute` ¶

The number of seconds the driver will wait for a statement to execute. Zero means there is no limit.

This option depends on driver implementation, some drivers can check the timeout of each query instead of an entire JDBC batch.

`parse(options)` `classmethod` ¶

If a parameter inherited from the ReadOptions class was passed, then it will be returned unchanged. If a Dict object was passed it will be converted to ReadOptions.

Otherwise, an exception will be raised

Executing statements in Clickhouse¶

How to¶

Use Clickhouse.fetch¶

Syntax support in Clickhouse.fetch¶

Examples for Clickhouse.fetch¶

Use Clickhouse.execute¶

Syntax support in Clickhouse.execute¶

Examples for Clickhouse.execute¶

Notes¶

Options¶

ClickhouseFetchOptions ¶

fetchsize = None class-attribute instance-attribute ¶

query_timeout = Field(default=None, alias='queryTimeout') class-attribute instance-attribute ¶

parse(options) classmethod ¶

ClickhouseExecuteOptions ¶

fetchsize = None class-attribute instance-attribute ¶

query_timeout = Field(default=None, alias='queryTimeout') class-attribute instance-attribute ¶

parse(options) classmethod ¶

Use `Clickhouse.fetch`¶

Syntax support in `Clickhouse.fetch`¶

Examples for `Clickhouse.fetch`¶

Use `Clickhouse.execute`¶

Syntax support in `Clickhouse.execute`¶

Examples for `Clickhouse.execute`¶

`ClickhouseFetchOptions` ¶

`fetchsize = None` `class-attribute` `instance-attribute` ¶

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute` ¶

`parse(options)` `classmethod` ¶

`ClickhouseExecuteOptions` ¶

`fetchsize = None` `class-attribute` `instance-attribute` ¶

`query_timeout = Field(default=None, alias='queryTimeout')` `class-attribute` `instance-attribute` ¶

`parse(options)` `classmethod` ¶