Skip to content

Forge key

Prefix stripping

In order to access a resource stored on a S3 like/Blob like storage,you obviously need to specify its location.

using S3 object storage terminology, we call key the full path to access to your desired resource.

Coming from the docker universe, the ecodev team had the habbits of mounting a data volume in the root /app folder.

In certain corner cases, even if you are familiar with S3/blob storage, you might stil want to access data as fast as can be (hence using something like an Elastic File Storage).

So to be coherent between disk accesses and S3/blob accesses, we took the habit of always prefixing our pathlib Path with app. Not wanting to have this app present on the S3/blob, we remove it with the forge_key method

def forge_key(file_path: Path) -> str:
    Form a valid cloud key out of the passed file_path
     (basically trailing the leading ecodev_cloud/ parent)
    return str(file_path.relative_to(*[:2]))

It proved to be very convenient for us to do so, but we understand that someone starting from scratch its S3/blob journey would rather prefer not to have this funky prefix stripping 😅.

If you are in this case do not hesitate on creating an issue at, and we will think about a way to deal with other scenarios (most presumably with a new env variable).

In the meantime, keep in mind that the highest folder in your path will be stripped when interacting with the distance storage.


To try to vindicate our funky prefix stripping choice, here find a real life example of how we create a Folder pydantic class with all storage (being it docker volumes, EFS like, S3/blob storage...) locations aggregated in one place (very convenient, and can be instanciated differently for tests!)

from pydantic import BaseModel
from pathlib import Path

Root Directory of the docker
ROOT_DIRECTORY = Path('/app')
ROOT_SHARED_DATA_DIR = Path('/app/shared_data')
Directory where all climate_model_data should be put
CLIMATE_MODEL_DATA = 'climate_model_data'
Directory where all indicators are stored
Directory where all client data are to be found
Directory where all geographical data should computed/stored
Directory where all logs are stored for subsequent analysis

class Folders(BaseModel):
    Simple class storing all important folders. In production, these are the folders mounted on the
     container. In the end-to-end test test_generate_client_output, different values are used so
     as not to erase all production information :)
    data: Path
    geo: Path
    indicator: Path
    client: Path
    logs: Path

All important folders in production mode

Here we see the interest of out forge_key prefix stripping and of ecodev-cloud: the disk and storage data are treated all the same.