DB Upsertion¶
Helper CRUD methods to interact with db
Version¶
In order to keep previous versions of a particular (table, column, row) triplet, we use a Version
table.
class Version(SQLModel, table=True): # type: ignore
"""
Table handling versioning.
Attributes are:
- table: the table to version
- column: the column of table to version
- row_id: the the row id of the column of table to version
- col_type: the column type
- value: the value to store (previous row/column/table version)
"""
__tablename__ = 'version'
id: Optional[int] = Field(default=None, primary_key=True)
created_at: datetime = Field(default_factory=datetime.utcnow)
table: str = Field(index=True)
column: str = Field(index=True)
row_id: int = Field(index=True)
col_type: ColType
value: str | None = Field(index=True)
__table_args__ = (
Index(
'version_filter_id',
'table',
'row_id',
'column'
),
)
To be noted ,the col_type
attribute encodes what column type we are able to handle (6 as of 2025/03/18)
@unique
class ColType(str, Enum):
"""
Enum listing all col types allowed to be stored in version class
"""
STR = 'str'
BOOL = 'bool'
INT = 'int'
FLOAT = 'float'
DATE = 'datetime'
ENUM = 'enum'
COL_TYPES = str | int | bool | float | datetime | Enum | None
To retrieve previous Version
, use the following two methods
get_versions
¶
Retrieve all previous versions of a given (table: str, column: str, row_id: int)
triple
get_row_versions
¶
Retrieve all previous versions all of versioned columns of a given (table: str, row_id: int)
couple
upsert_data
and upsert_df_data
¶
Core methods of these CRUD helpers. upsert_df_data
works on a pandas dataframe, and upsert_data
on a list
of dict | SQLModel
. upsert_df_data
converts its input and calls upsert_data
under the hood.
Both do the upsertion of the passed data and the (optional in the case of upsert_data
) passed db_schema: SQLModelMetaclass
.
NB: db_schema
is optional for upsert_data
, as it can be retrieved by
reflection in the case of a list
of SQLModel
objects.
Upsertion means the following:
- all new
dict
data are inserted into the db. New data are identified by theirsfield
and theupsert_selector
method (more on that below) - all
dict
already present in db are upserted, only editingfield
(more on that below), and only storing inVersion
the previous version if the value is indeed modified.
Example:
class UpFoo(SQLModel, table=True): # type: ignore
"""
Test class to test db upsertion
"""
__tablename__ = 'up_foo'
id: Optional[int] = Field(default=None, primary_key=True)
bar1: str = sfield()
bar2: bool = sfield()
bar3: str = field()
bar4: bool = field()
bar5: float = field()
bar6: int = field()
bar7: datetime = field()
bar8: Permission = field()
bar9: Optional[str] = field(default=None)
foo = UpFoo(bar1='bar', bar2=True, bar3='bar', bar4=False, bar5=42.42, bar6=42,
bar7=datetime(2025, 3, 17), bar8=Permission.ADMIN)
foo2 = UpFoo(bar1='bar', bar2=True, bar3='babar', bar4=False, bar5=42.42, bar6=42,
bar7=datetime(2025, 3, 17), bar8=Permission.ADMIN)
upsert_data
on first [foo]
and then [foo2]
will only create one row in up_foo
,
and one in version
(storing the old bar
value of column bar4
). You can go on the test suite to learn more.
upsert_deletor
¶
Delete one row of the db corresponding to the passed values: SQLModel
.
db_schema
is retrieved by reflection.
field
and sfield
¶
simple wrapper methods around SQLModel Field
.
Used to specify whether an attribute is required for uniquely selecting a row (this is sfield
) or need to be versioned (this is field
).
The interface is in both cases identical to Field
, with this additional behaviour. A column no used to select and not to be versioned can use the
regular Field
(as most presumably every id
primary key column).
upsert_selector
¶
Select one row of the db corresponding to the passed values: SQLModel
.
db_schema
is retrieved by reflection.
db_to_value
¶
Version
store all types as str (or None
) type. To convert back into the original column col_type
, call this method.
def db_to_value(db_value: str | None, col_type: type | EnumType) -> COL_TYPES:
"""
Convert back a str version value stored to a real value (types hancled listed in ColType)
NB: assumption that if the type is not known, this is an enum type.
"""
if db_value is None:
return None
if col_type in [int, str, float]:
return col_type(db_value)
if col_type == bool:
return True if db_value == 'True' else False
if col_type == datetime:
return datetime.strptime(db_value, '%Y-%m-%d %H:%M:%S.%f')
return col_type[db_value] # type: ignore[index]