Pandas helpers¶
Simple helper methods related to pandas
pd_equals
¶
If you ever pull your hair out with None
values converted to np.nan
when stored on disk by pandas to_csv
, causing issues when comparing two dataframes,
then pd_equals
is for you. It is meant to compare a stored csv from a one in memory, and does so by writing and reading the latter to have the same funky conversion for both.
Of course the real solution would be to use the converters
option from read_csv
(look the official documentation. ), but it can be quite tedious and frankly overkill for tests.
# test pd_equals method (subtlety of None np.Nan that imposes to write on/read from disk)
df = pd.DataFrame.from_dict({'a': [1], 'b': None})
self.assertTrue(pd_equals(df, TEST_FILE) is None)
jsonify_series
¶
As for pd_equals
, converting a pandas serie to something that is json acceptable can be useful.
Simple example
# test jsonify_series (subtlety of None/np.Nan)
df = pd.DataFrame.from_dict({'a': [1, 2], 'b': [np.nan, 2]})
self.assertDictEqual(jsonify_series(df['b']), {0: None, 1: 2.0})
get_excelfile
¶
Takes an input file, inserted through a front interface such as FastApi
or a dcc.Upload
component in Dash and returns a pd.ExcelFile
object
Simple example
# test get_excelfile
xl = get_excelfile(load_json_file(TEST_FILE.parent / 'str_parsing.json')['data'])
safe_drop_columns
¶
Checks if a list
of columns are in a pd.DataFrame
and drops only the subset which has been
found, preventing any errors
Simple example
# test safe_drop_columns
df = pd.DataFrame({'a': [1, 2, 3], 'b': [None, None, None]}
pd.testing.assert_frame_equal(safe_drop_columns(df, ['b']), df.drop(columns='b')
pd.testing.assert_frame_equal(safe_drop_columns(df, ['c']), df)
is_null
¶
If you ever struggled with null values, that can take multiple forms None
, np.nan
, NaN
,...
this function is made for you as it will try different representation of a null value and return
a bool
Simple example
# test is_null
self.assertTrue(is_null(None))
self.assertTrue(is_null(np.nan))
self.assertFalse(is_null(15))
self.assertFalse(is_null('test'))
get_value
¶
If you want to get a value of a pd.Series
based on a column name when iterating over a
pd.DataFrame
, but you don't know if the column is actually in the pd.DataFrame
beforehand,
you can use get_value
to get the value if the column exist (and apply a transformation function or
transform it into an enum) else None
. An equivalent to dict.get()
for pandas
.
Simple example
# test get_value
df = pd.DataFrame({'a': [1, 2, 3], 'b': [None, None, None]})
self.assertEqual(get_value('a', intify, df.iloc[0]), 1)
self.assertEqual(get_value('c', intify, df.iloc[0]), None)