databricks_utils.aws

Description

Utility classes to interface with AWS for databricks notebooks.

class databricks_utils.aws.S3Bucket(bucketname, aws_access_key, aws_secret_key, dbutils=None)

Bases: object

Class to wrap around a S3 bucket and mount at databricks fs.

Parameters:
  • bucketname – name of the S3 bucket
  • aws_access_key – AWS access key
  • aws_secret_key – AWS secret key
  • dbutils – databricks dbutils (not needed if S3Bucket.attach_dbutils has been called)
allow_spark(spark_context)

Update spark context hadoop config with AWS access information so that databricks spark can access the S3 bucket.

Parameters:spark_context – databricks spark context
classmethod attach_dbutils(dbutils)

Attach databricks dbutils to S3Bucket. You MUST attach this before S3Bucket can be used.

Parameters:dbutils – databricks dbutils (https://docs.databricks.com/user-guide/dev-tools/dbutils.html#dbutils)
local(path)

Return the absolute path to the corresponding resource in dbfs.

Parameters:path – relative path to a resource in the s3 bucket.
ls(path='', display=None)

List the files and folders in s3 bucket mounted in dbfs.

Parameters:
  • path – path relative to the s3 bucket
  • display – a Callable to render the HTML output. e.g. displayHTML
mount(mount_pt, dbutils=None)

Mounts the S3 bucket in dbfs. environment variables AWS_ACCESS_KEY and AWS_SECRET_KEY must be set.

Parameters:
  • mount_pt – Where to mount the S3 bucket in the dbfs.
  • display – Callable to display
  • dbutilsdbutils module
s3(path)

Return the path to the corresponding resource in the s3 bucket that is interpretable by the databricks spark worker.

Parameters:path – relative path to a resource in the s3 bucket.
umount(dbutils)

umount the s3 bucket.

Classes

S3Bucket(bucketname, aws_access_key, …[, …]) Class to wrap around a S3 bucket and mount at databricks fs.