Skip to content

Clickhouse connection

Bases: JDBCConnection

Clickhouse JDBC connection. |support_hooks|

Based on Maven package com.clickhouse:clickhouse-jdbc:0.7.2 <https://mvnrepository.com/artifact/com.clickhouse/clickhouse-jdbc/0.7.2> (official Clickhouse JDBC driver <https://github.com/ClickHouse/clickhouse-jdbc>).

.. seealso::

Before using this connector please take into account :ref:`clickhouse-prerequisites`

.. versionadded:: 0.1.0

Parameters:

  • host (str) –

    Host of Clickhouse database. For example: test.clickhouse.domain.com or 193.168.1.11

  • port (int, default: `8123` ) –

    Port of Clickhouse database

  • user (str) –

    User, which have proper access to the database. For example: some_user

  • password (str) –

    Password for database connection

  • database (str) –

    Database (==schema) in Clickhouse.

  • spark (:obj:pyspark.sql.SparkSession) –

    Spark session.

  • extra (dict, default: `None` ) –

    Specifies one or more extra parameters by which clients can connect to the instance.

    For example: {"continueBatchOnError": "false"}.

    See:

    • Clickhouse JDBC driver properties documentation <https://clickhouse.com/docs/en/integrations/java#configuration>_
    • Clickhouse core settings documentation <https://clickhouse.com/docs/en/operations/settings/settings>_
    • Clickhouse query complexity documentation <https://clickhouse.com/docs/en/operations/settings/query-complexity>_
    • Clickhouse query level settings <https://clickhouse.com/docs/en/operations/settings/query-level>_

Examples:

Create and check Clickhouse connection:

.. code:: python

from onetl.connection import Clickhouse
from pyspark.sql import SparkSession

# Create Spark session with Clickhouse driver loaded
maven_packages = Clickhouse.get_packages()
spark = (
    SparkSession.builder.appName("spark-app-name")
    .config("spark.jars.packages", ",".join(maven_packages))
    .getOrCreate()
)

# Create connection
clickhouse = Clickhouse(
    host="database.host.or.ip",
    user="user",
    password="*****",
    extra={"continueBatchOnError": "false"},
    spark=spark,
).check()

get_packages(package_version=None, apache_http_client_version=None) classmethod

Get package names to be downloaded by Spark. |support_hooks|

Allows specifying custom JDBC and Apache HTTP Client versions.

.. versionadded:: 0.9.0

Parameters:

  • package_version (str, default: None ) –

    ClickHouse JDBC version client packages. Defaults to 0.7.2.

    Versions 0.8.0-0.9.2 are not supported, see issue #2625 <https://github.com/ClickHouse/clickhouse-java/issues/2625>_.

    .. versionadded:: 0.11.0

  • apache_http_client_version (str, default: None ) –

    Apache HTTP Client version package. Defaults to 5.4.2.

    Used only if package_version is in range 0.5.0-0.7.0.

    .. versionadded:: 0.11.0

Examples:

.. code:: python

from onetl.connection import Clickhouse

Clickhouse.get_packages()
Clickhouse.get_packages(package_version="0.7.2")
Clickhouse.get_packages(package_version="0.6.0", apache_http_client_version="5.4.2")