AIS Python SDK provides a (growing) set of client-side APIs to access and utilize AIS clusters, buckets, and objects.
The project is, essentially, a Python port of the AIS Go APIs, with additional objectives that prioritize utmost convenience for Python developers.
The SDK also includes the authn sub-package for managing authentication, users, roles, clusters, and tokens. For more details, refer to the AuthN sub-package README.
Support is currently limited to Python 3.x, with a minimum requirement of version 3.8 or later.
The latest AIS release can be easily installed either with Anaconda or pip:
$ conda install aistore$ pip install aistoreIf you'd like to work with the current upstream (and don't mind the risk), install the latest master directly from GitHub:
$ git clone https://github.com/NVIDIA/aistore.git
$ cd aistore/python/
# upgrade pip to latest version
$ python -m pip install --upgrade pip
# install dependencies
$ pip install -r aistore/common_requirements
$ pip install -e .In order to interact with your running AIS instance, you will need to create a client object:
from aistore.sdk import Client
client = Client("http://localhost:8080")Note:
http://localhost:8080address (above and elsewhere) must be understood as a placeholder for an arbitrary AIStore endpoint (AIS_ENDPOINT).
The newly created client object can be used to interact with your AIS cluster, buckets, and objects.
See the examples and the reference docs for more details
External Cloud Storage Buckets
AIS supports a number of different backend providers or, simply, backends.
For exact definitions and related capabilities, please see terminology.
Many bucket/object operations support remote cloud buckets (third-party backend-based cloud buckets), including a few of the operations shown above. To interact with remote cloud buckets, you need to specify the provider of choice when instantiating your bucket object as follows:
# Head AWS bucket
client.bucket("my-aws-bucket", provider="aws").head()# Evict GCP bucket
client.bucket("my-gcp-bucket", provider="gcp").evict()# Get object from Azure bucket
client.bucket("my-azure-bucket", provider="azure").object("filename.ext").get_reader()# List objects in AWS bucket
client.bucket("my-aws-bucket", provider="aws").list_objects()Please note that certain operations do not support external cloud storage buckets. Please refer to the SDK reference documentation for more information on which bucket/object operations support remote cloud buckets, as well as general information on class and method usage.
The AIS Python SDK supports several environment variables that allow you to configure client behavior without explicitly passing parameters to the Client constructor. This is particularly useful for containerized environments or when you want to centralize configuration.
| Environment Variable | Description | Default Value |
|---|---|---|
AIS_AUTHN_TOKEN |
Authentication token for accessing the AIS cluster | None |
AIS_SKIP_VERIFY |
Skip SSL certificate verification when set to true, 1, or yes |
false |
AIS_CLIENT_CA |
Path to CA certificate file for SSL verification | None |
AIS_CRT |
Path to client certificate file for mTLS authentication | None |
AIS_CRT_KEY |
Path to client certificate key file for mTLS authentication | None |
AIS_CONNECT_TIMEOUT |
Connection timeout in seconds (set to 0 to disable) |
3 |
AIS_READ_TIMEOUT |
Read timeout in seconds (set to 0 to disable) |
20 |
AIS_MAX_CONN_POOL |
Maximum number of connections per host in the connection pool | 10 |
ETL webservers support all SDK client environment variables above (same semantics). In addition, the following are specific to ETL webservers:
| Environment Variable | Description | Default Value |
|---|---|---|
AIS_DIRECT_PUT_CHUNK_SIZE |
Chunk size in bytes for streaming direct-put bodies (FastAPIServer only) |
1048576 (1 MiB) |
AIS_DIRECT_PUT_RETRIES |
Retries on transient connection errors during direct-put (all ETL servers) | 3 |
MAX_CONN |
Maximum total outbound connections in the httpx pool (FastAPIServer only) |
256 |
MAX_KEEPALIVE_CONN |
Maximum keepalive connections in the httpx pool (FastAPIServer only) |
128 |
KEEPALIVE_EXPIRY |
Keepalive connection expiry in seconds (FastAPIServer only) |
30 |
When the Client is initialized, configuration values are resolved in the following order (highest to lowest precedence):
- Explicit parameters passed to the
Client()constructor - Environment variables (listed above)
- Default values (built-in defaults)
This means that if you provide a parameter directly to the Client() constructor, it will always take precedence over environment variables and defaults.
import os
from aistore.sdk import Client
# Set environment variables
os.environ["AIS_AUTHN_TOKEN"] = "your-auth-token"
os.environ["AIS_CONNECT_TIMEOUT"] = "5"
os.environ["AIS_READ_TIMEOUT"] = "30"
os.environ["AIS_MAX_CONN_POOL"] = "20"
os.environ["AIS_CLIENT_CA"] = "/path/to/client_ca.crt"
# Client will use the environment variables
client = Client("https://localhost:8080")The SDK supports HTTPS connectivity if the AIS cluster is configured to use HTTPS. To start using HTTPS:
- Set up HTTPS on your cluster: Guide for K8s cluster
- If using a self-signed certificate with your own CA, copy the CA certificate to your local machine. If using our built-in cert-manager config to generate your certificates, you can use our playbook
- Options to configure the SDK for HTTPS connectivity:
- Skip verification (for testing, insecure):
client = Client(skip_verify=True)
- Point the SDK to use your certificate using one of the below methods:
- Pass an argument to the path of the certificate when creating the client:
client = Client(ca_cert=/path/to/cert)
- Use the environment variable
- Set
AIS_CLIENT_CAto the path of your certificate before initializing the client
- Set
- Pass an argument to the path of the certificate when creating the client:
- If your AIS cluster is using a certificate signed by a trusted CA, the client will default to using verification without needing to provide a CA cert.
- Skip verification (for testing, insecure):
- Options to configure the SDK to work with mTLS:
- Pass a tuple argument containing path to client certificate and key pair
- `client = Client(client_cert=('client.crt', 'client.key'))
- Pass a path to a PEM file that contains both client certificate and key
- `client = Client(client_cert='client.pem')
- Use the environment variable
- Set 'AIS_CRT' and 'AIS_CRT_KEY' to the path of client certificate and key respectively before initializing the client
- Pass a tuple argument containing path to client certificate and key pair
AIStore also supports ETLs, short for Extract-Transform-Load. ETLs with AIS are beneficial given that the transformations occur locally, which largely contributes to the linear scalability of AIS.
Note: AIS-ETL requires Kubernetes. For more information on deploying AIStore with local Kubernetes, refer here.
To learn more about working with AIS ETL, check out examples.
| Module | Summary |
|---|---|
| client.py | Contains Client class, which has methods for making HTTP requests to an AIStore server. Includes factory constructors for Bucket, Cluster, and Job classes. |
| cluster.py | Contains Cluster class that represents a cluster bound to a client and contains all cluster-related operations, including checking the cluster's health and retrieving vital cluster information. |
| bucket.py | Contains Bucket class that represents a bucket in an AIS cluster and contains all bucket-related operations, including (but not limited to) creating, deleting, evicting, renaming, copying. |
| object.py | Contains class Object that represents an object belonging to a bucket in an AIS cluster, and contains all object-related operations, including (but not limited to) retrieving, adding and deleting objects. |
| object_group.py | Contains class ObjectGroup, representing a collection of objects belonging to a bucket in an AIS cluster. Includes all multi-object operations such as deleting, evicting, prefetching, copying, and transforming objects. |
| job.py | Contains class Job and all job-related operations. |
| dsort/core.py | Contains class Dsort and all dsort-related operations. |
| etl.py | Contains class Etl and all ETL-related operations. |
For more information on SDK usage, refer to the SDK reference documentation or see the examples here.