Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

README.md

AIS Python SDK

AIS Python SDK provides a (growing) set of client-side APIs to access and utilize AIS clusters, buckets, and objects.

The project is, essentially, a Python port of the AIS Go APIs, with additional objectives that prioritize utmost convenience for Python developers.

The SDK also includes the authn sub-package for managing authentication, users, roles, clusters, and tokens. For more details, refer to the AuthN sub-package README.

Support is currently limited to Python 3.x, with a minimum requirement of version 3.8 or later.


Installation

Install as a Package

The latest AIS release can be easily installed either with Anaconda or pip:

$ conda install aistore
$ pip install aistore

Install From Source

If you'd like to work with the current upstream (and don't mind the risk), install the latest master directly from GitHub:

$ git clone https://github.com/NVIDIA/aistore.git

$ cd aistore/python/

# upgrade pip to latest version
$ python -m pip install --upgrade pip

# install dependencies
$ pip install -r aistore/common_requirements

$ pip install -e .

Quick Start

In order to interact with your running AIS instance, you will need to create a client object:

from aistore.sdk import Client

client = Client("http://localhost:8080")

Note: http://localhost:8080 address (above and elsewhere) must be understood as a placeholder for an arbitrary AIStore endpoint (AIS_ENDPOINT).

The newly created client object can be used to interact with your AIS cluster, buckets, and objects. See the examples and the reference docs for more details

External Cloud Storage Buckets

AIS supports a number of different backend providers or, simply, backends.

For exact definitions and related capabilities, please see terminology.

Many bucket/object operations support remote cloud buckets (third-party backend-based cloud buckets), including a few of the operations shown above. To interact with remote cloud buckets, you need to specify the provider of choice when instantiating your bucket object as follows:

# Head AWS bucket
client.bucket("my-aws-bucket", provider="aws").head()
# Evict GCP bucket
client.bucket("my-gcp-bucket", provider="gcp").evict()
# Get object from Azure bucket
client.bucket("my-azure-bucket", provider="azure").object("filename.ext").get_reader()
# List objects in AWS bucket
client.bucket("my-aws-bucket", provider="aws").list_objects()

Please note that certain operations do not support external cloud storage buckets. Please refer to the SDK reference documentation for more information on which bucket/object operations support remote cloud buckets, as well as general information on class and method usage.


Environment Variables

The AIS Python SDK supports several environment variables that allow you to configure client behavior without explicitly passing parameters to the Client constructor. This is particularly useful for containerized environments or when you want to centralize configuration.

SDK Client Environment Variables

Environment Variable Description Default Value
AIS_AUTHN_TOKEN Authentication token for accessing the AIS cluster None
AIS_SKIP_VERIFY Skip SSL certificate verification when set to true, 1, or yes false
AIS_CLIENT_CA Path to CA certificate file for SSL verification None
AIS_CRT Path to client certificate file for mTLS authentication None
AIS_CRT_KEY Path to client certificate key file for mTLS authentication None
AIS_CONNECT_TIMEOUT Connection timeout in seconds (set to 0 to disable) 3
AIS_READ_TIMEOUT Read timeout in seconds (set to 0 to disable) 20
AIS_MAX_CONN_POOL Maximum number of connections per host in the connection pool 10

ETL Webserver Environment Variables

ETL webservers support all SDK client environment variables above (same semantics). In addition, the following are specific to ETL webservers:

Environment Variable Description Default Value
AIS_DIRECT_PUT_CHUNK_SIZE Chunk size in bytes for streaming direct-put bodies (FastAPIServer only) 1048576 (1 MiB)
AIS_DIRECT_PUT_RETRIES Retries on transient connection errors during direct-put (all ETL servers) 3
MAX_CONN Maximum total outbound connections in the httpx pool (FastAPIServer only) 256
MAX_KEEPALIVE_CONN Maximum keepalive connections in the httpx pool (FastAPIServer only) 128
KEEPALIVE_EXPIRY Keepalive connection expiry in seconds (FastAPIServer only) 30

Configuration Precedence

When the Client is initialized, configuration values are resolved in the following order (highest to lowest precedence):

  1. Explicit parameters passed to the Client() constructor
  2. Environment variables (listed above)
  3. Default values (built-in defaults)

This means that if you provide a parameter directly to the Client() constructor, it will always take precedence over environment variables and defaults.

Examples

Using Environment Variables

import os
from aistore.sdk import Client

# Set environment variables
os.environ["AIS_AUTHN_TOKEN"] = "your-auth-token"
os.environ["AIS_CONNECT_TIMEOUT"] = "5"
os.environ["AIS_READ_TIMEOUT"] = "30"
os.environ["AIS_MAX_CONN_POOL"] = "20"
os.environ["AIS_CLIENT_CA"] = "/path/to/client_ca.crt"

# Client will use the environment variables
client = Client("https://localhost:8080")

HTTPS

The SDK supports HTTPS connectivity if the AIS cluster is configured to use HTTPS. To start using HTTPS:

  1. Set up HTTPS on your cluster: Guide for K8s cluster
  2. If using a self-signed certificate with your own CA, copy the CA certificate to your local machine. If using our built-in cert-manager config to generate your certificates, you can use our playbook
  3. Options to configure the SDK for HTTPS connectivity:
    • Skip verification (for testing, insecure):
      • client = Client(skip_verify=True)
    • Point the SDK to use your certificate using one of the below methods:
      • Pass an argument to the path of the certificate when creating the client:
        • client = Client(ca_cert=/path/to/cert)
      • Use the environment variable
        • Set AIS_CLIENT_CA to the path of your certificate before initializing the client
    • If your AIS cluster is using a certificate signed by a trusted CA, the client will default to using verification without needing to provide a CA cert.
  4. Options to configure the SDK to work with mTLS:
    • Pass a tuple argument containing path to client certificate and key pair
      • `client = Client(client_cert=('client.crt', 'client.key'))
    • Pass a path to a PEM file that contains both client certificate and key
      • `client = Client(client_cert='client.pem')
    • Use the environment variable
      • Set 'AIS_CRT' and 'AIS_CRT_KEY' to the path of client certificate and key respectively before initializing the client

ETLs

AIStore also supports ETLs, short for Extract-Transform-Load. ETLs with AIS are beneficial given that the transformations occur locally, which largely contributes to the linear scalability of AIS.

Note: AIS-ETL requires Kubernetes. For more information on deploying AIStore with local Kubernetes, refer here.

To learn more about working with AIS ETL, check out examples.


API Documentation

Module Summary
client.py Contains Client class, which has methods for making HTTP requests to an AIStore server. Includes factory constructors for Bucket, Cluster, and Job classes.
cluster.py Contains Cluster class that represents a cluster bound to a client and contains all cluster-related operations, including checking the cluster's health and retrieving vital cluster information.
bucket.py Contains Bucket class that represents a bucket in an AIS cluster and contains all bucket-related operations, including (but not limited to) creating, deleting, evicting, renaming, copying.
object.py Contains class Object that represents an object belonging to a bucket in an AIS cluster, and contains all object-related operations, including (but not limited to) retrieving, adding and deleting objects.
object_group.py Contains class ObjectGroup, representing a collection of objects belonging to a bucket in an AIS cluster. Includes all multi-object operations such as deleting, evicting, prefetching, copying, and transforming objects.
job.py Contains class Job and all job-related operations.
dsort/core.py Contains class Dsort and all dsort-related operations.
etl.py Contains class Etl and all ETL-related operations.

For more information on SDK usage, refer to the SDK reference documentation or see the examples here.

References