Skip to content

Writing to Iceberg using DBWriter

For writing data to Iceberg, use DBWriter <onetl.db.db_writer.db_writer.DBWriter>.

Examples

from onetl.connection import Iceberg
from onetl.db import DBWriter

iceberg = Iceberg(catalog_name="my_catalog", ...)

df = ...  # data is here

writer = DBWriter(
    connection=iceberg,
    target="my_schema.my_table",  # catalog name is already defined in connection
    options=Iceberg.WriteOptions(
        if_exists="append",
    ),
)

writer.run(df)

Options

Bases: GenericOptions

Iceberg source writing options.

if_exists = Field(default=(IcebergTableExistBehavior.APPEND), alias=(avoid_alias('mode'))) class-attribute instance-attribute

Behavior of writing data into existing table.

Possible values:

  • append (default) Appends data into existing table, or create table if it does not exist.

    Same as Spark's df.writeTo(table).using("iceberg").append().

    .. dropdown:: Behavior in details

    * Table does not exist
        Table is created.
    
    * Table exists and not partitioned
        Data is appended to a table. Table DDL (including partition spec) is unchanged.
    
    * Table exists and partitioned
        If a partition is present only in dataframe
            Partition is created.
        If a partition is present in both dataframe and table
            Data is appended to existing partition.
    
        .. warning::
    
            This mode does not check whether table already contains
            rows from dataframe, so duplicated rows can be created.
    
            To implement deduplication, write data to staging table first,
            and then perform some deduplication logic using :obj:`~sql`.
    
    * Table exists and partitioned, but some partitions are present only in table, not dataframe
        Existing partitions are left intact.
    
  • replace_overlapping_partitions Overwrites data in the existing partitions, or create table if it does not exist.

    Same as Spark's df.writeTo(table).using("iceberg").overwritePartitions()

    .. DANGER::

    This mode does make sense **ONLY** if the table is partitioned.
    **IF NOT, YOU'LL LOSE YOUR DATA!**
    

    .. dropdown:: Behavior in details

    * Table does not exist
        Table is created.
    
    * Table exists and not partitioned
        Data is **overwritten in all the table**. Table DDL (including partition spec) is unchanged.
    
    * Table exists and partitioned
        If a partition is present only in dataframe
            Partition is created.
        If a partition is present in both dataframe and table
            Existing partition **replaced** with data from dataframe.
        If a partition is present only in table, not dataframe
            Existing partition is left intact.
    
  • replace_entire_table Recreates table (via DROP + CREATE), deleting all existing data. All existing partitions are dropped.

    Same as Spark's df.writeTo(table).createOrReplace()

    .. warning::

    Table is recreated
    **instead of using original table options**. Be careful
    
  • ignore Ignores the write operation if the table already exists.

    .. dropdown:: Behavior in details

    * Table does not exist
        Table is created.
    
    * Table exists
        If the table exists, **no further action is taken**.
    
  • error Raises an error if the table already exists.

    .. dropdown:: Behavior in details

    * Table does not exist
        Table is created.
    
    * Table exists
        If the table exists, **raises an error**.
    

table_properties = Field(default_factory=dict) class-attribute instance-attribute

TBLPROPERTIES to add to freshly created table.

Examples: {"location": "/path"}

.. warning::

Used **only** while **creating new table**, or in case of ``if_exists=replace_entire_table``