Snapshot Strategy¶
Bases: BaseStrategy
Snapshot strategy for :ref:db-reader/:ref:file-downloader.
Used for fetching all the rows/files from a source. Does not support HWM.
.. note::
This is a default strategy.
For :ref:db-reader:
Every snapshot run is executing the simple query which fetches all the table data:
.. code:: sql
SELECT id, data FROM public.mydata;
For :ref:file-downloader:
Every snapshot run is downloading all the files (from the source, or user-defined list):
.. code:: bash
$ hdfs dfs -ls /path
/path/my/file1
/path/my/file2
.. code:: python
DownloadResult(
...,
successful={
LocalFile("/downloaded/file1"),
LocalFile("/downloaded/file2"),
},
)
.. versionadded:: 0.1.0
Examples:
.. tabs::
.. code-tab:: py Snapshot run with :ref:`db-reader`
from onetl.db import DBReader, DBWriter
from onetl.strategy import SnapshotStrategy
reader = DBReader(
connection=postgres,
source="public.mydata",
columns=["id", "data"],
hwm=DBReader.AutoDetectHWM(name="some_hwm_name", expression="id"),
)
writer = DBWriter(connection=hive, target="db.newtable")
with SnapshotStrategy():
df = reader.run()
writer.run(df)
# current run will execute following query:
# SELECT id, data FROM public.mydata;
.. code-tab:: py Snapshot run with :ref:`file-downloader`
from onetl.file import FileDownloader
from onetl.strategy import SnapshotStrategy
downloader = FileDownloader(
connection=sftp,
source_path="/remote",
local_path="/local",
)
with SnapshotStrategy():
df = downloader.run()
# current run will download all files from 'source_path'