Skip to content

S3 connection

Bases: FileConnection

S3 file connection. |support_hooks|

Based on minio-py client <https://pypi.org/project/minio/>_.

.. warning::

Since onETL v0.7.0 to use S3 connector you should install package as follows:

.. code:: bash

    pip install "onetl[s3]"

    # or
    pip install "onetl[files]"

See :ref:`install-files` installation instruction for more details.

.. versionadded:: 0.5.1

Parameters:

  • host (str) –

    Host of S3 source. For example: s3.domain.com

  • port (int) –

    Port of S3 source

  • bucket (str) –

    Bucket name in the S3 file source

  • access_key (str) –

    Access key (aka user ID) of an account in the S3 service

  • secret_key (str) –

    Secret key (aka password) of an account in the S3 service

  • protocol (str, default: : `https` ) –

    Connection protocol. Allowed values: https or http

    .. versionchanged:: 0.6.0 Renamed secure: bool to protocol: Literal["https", "http"]

  • region (str) –

    Region name of bucket in S3 service. Optional for some S3 implementations (MinIO, Ozone), but could be mandatory for others.

  • session_token (str) –

    Session token generated by S3 STS service, if used.

Examples:

Create and check S3 connection:

.. code:: python

from onetl.connection import S3

s3 = S3(
    host="s3.domain.com",
    protocol="http",
    bucket="my-bucket",
    access_key="ACCESS_KEY",
    secret_key="SECRET_KEY",
    region="us-east-1",
).check()

path_exists(path)

Check if specified path exists on remote filesystem. |support_hooks|.

.. versionadded:: 0.8.0

Parameters:

  • path (str | :obj:os.PathLike) –

    Path to check

Returns:

  • ``True`` if path exists, ``False`` otherwise

Examples:

>>> connection.path_exists("/path/to/file.csv")
True
>>> connection.path_exists("/path/to/dir")
True
>>> connection.path_exists("/path/to/missing")
False

resolve_dir(path)

Returns directory at specific path, with stats. |support_hooks|

.. versionadded:: 0.8.0

Parameters:

  • path (str | :obj:os.PathLike) –

    Path to resolve

Returns:

  • Directory path with stats

Raises:

  • :obj:`onetl.exception.DirectoryNotFoundError`

    Path does not exist

  • NotADirectoryError

    Path is not a directory

Examples:

>>> dir_path = connection.resolve_dir("/path/to/dir")
>>> os.fspath(dir_path)
'/path/to/dir'
>>> dir_path.stat().st_uid  # owner id
12345

resolve_file(path)

Returns file at specific path, with stats. |support_hooks|

.. versionadded:: 0.8.0

Parameters:

  • path (str | :obj:os.PathLike) –

    Path to resolve

Returns:

  • File path with stats

Raises:

  • FileNotFoundError

    Path does not exist

  • :obj:`onetl.exception.NotAFileError`

    Path is not a file

Examples:

>>> file_path = connection.resolve_file("/path/to/dir/file.csv")
>>> os.fspath(file_path)
'/path/to/dir/file.csv'
>>> file_path.stat().st_uid  # owner id
12345

create_dir(path)

Creates directory tree on remote filesystem. |support_hooks|

.. versionadded:: 0.8.0

Parameters:

  • path (str | :obj:os.PathLike) –

    Directory path

Returns:

  • Created directory with stats

Raises:

  • :obj:`onetl.exception.NotAFileError`

    Path is not a file

Examples:

>>> dir_path = connection.create_dir("/path/to/dir")
>>> os.fspath(dir_path)
'/path/to/dir'

remove_dir(path, *, recursive=False)

Remove directory or directory tree. |support_hooks|

If directory does not exist, no exception is raised.

.. versionadded:: 0.8.0

Parameters:

  • path (str | :obj:os.PathLike) –

    Directory path to remove

  • recursive (bool, default: `False` ) –

    If True, remove directory tree recursively (including files and subdirectories).

    If False, remove only directory itself. Directory should be empty.

Returns:

  • ``True`` if directory was removed, ``False`` if directory does not exist in the first place.

Raises:

  • NotADirectoryError

    Path is not a directory

  • :obj:`onetl.exception.DirectoryNotEmptyError`

    Directory is not empty, and recursive is False

Examples:

>>> connection.remove_dir("/path/to/dir")
Traceback (most recent call last):
    ...
onetl.exception.DirectoryNotEmptyError: Cannot delete non-empty directory '/path/to/dir'
>>> connection.remove_dir("/path/to/dir", recirsive=True)
True
>>> connection.path_exists("/path/to/dir")
False
>>> connection.path_exists("/path/to/dir/file.csv")
False
>>> connection.remove_dir("/path/to/dir")  # already deleted, no error
False