Skip to content

Snapshot Strategy

Bases: BaseStrategy

Snapshot strategy for :ref:db-reader/:ref:file-downloader.

Used for fetching all the rows/files from a source. Does not support HWM.

.. note::

This is a default strategy.

For :ref:db-reader: Every snapshot run is executing the simple query which fetches all the table data:

.. code:: sql

    SELECT id, data FROM public.mydata;

For :ref:file-downloader: Every snapshot run is downloading all the files (from the source, or user-defined list):

.. code:: bash

    $ hdfs dfs -ls /path

    /path/my/file1
    /path/my/file2

.. code:: python

    DownloadResult(
        ...,
        successful={
            LocalFile("/downloaded/file1"),
            LocalFile("/downloaded/file2"),
        },
    )

.. versionadded:: 0.1.0

Examples:

.. tabs::

.. code-tab:: py Snapshot run with :ref:`db-reader`

    from onetl.db import DBReader, DBWriter
    from onetl.strategy import SnapshotStrategy

    reader = DBReader(
        connection=postgres,
        source="public.mydata",
        columns=["id", "data"],
        hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
    )

    writer = DBWriter(connection=hive, target="db.newtable")

    with SnapshotStrategy():
        df = reader.run()
        writer.run(df)

    # current run will execute following query:

    # SELECT id, data FROM public.mydata;

.. code-tab:: py Snapshot run with :ref:`file-downloader`

    from onetl.file import FileDownloader
    from onetl.strategy import SnapshotStrategy

    downloader = FileDownloader(
        connection=sftp,
        source_path="/remote",
        local_path="/local",
    )

    with SnapshotStrategy():
        df = downloader.run()

    # current run will download all files from 'source_path'