Skip to content

Cloud read

One of the two central methods of ecodev-cloud is load_cloud_data allows one to load data from the cloud provider specified in the cloud_provider environment variable.

It's interface reads

def load_cloud_data(file_path: Path,
                    cloud: Cloud = CLOUD,
                    location: str | None = None
                    ) -> Any:

where:

  • file_path: a pathlib Path (by the way, we advise to get rid of all os code when you can. For dealing with files/folders, pathlib is just way simpler and user friendly to use.) specifying where the data is located in the s3 bucket / blob container. Remember to read the forge key page to learn how to properly... Well, forge a key 😊.
  • cloud (optional): the cloud provider used to connect to a distance storage. Read the installation guide in order to learn how to connect to your S3 like/blob cloud provider, and also consult cloud details. By default, the environment variable cloud_provider is used.
  • location (optional): the s3 bucket/blob container on which to connect. By default the s3_bucket_name environment variable is used if cloud=Cloud.AWS, and container is used if cloud=Cloud.Azure.

As of 2024/05, this method can read the following file types (it relies on the file extension given in file_type to use the appropriate loader):

  • csv
  • xlsx
  • npy
  • npy.npz (compressed numpy)
  • json
  • netcdf (a very useful format when dealing with climate data)
  • tex
  • txt
  • tif
  • gpkg
  • shp (subtlety: you have to store the shapefile zipped in order to easily retrieve it. You can go inspect the source code, and the load_zipped_shp method to learn more)