Skip to content

Pandas helpers

Simple helper methods related to pandas

pd_equals

If you ever pull your hair out with None values converted to np.nan when stored on disk by pandas to_csv, causing issues when comparing two dataframes, then pd_equals is for you. It is meant to compare a stored csv from a one in memory, and does so by writing and reading the latter to have the same funky conversion for both.

Of course the real solution would be to use the converters option from read_csv (look the official documentation. ), but it can be quite tedious and frankly overkill for tests.

# test pd_equals method (subtlety of None np.Nan that imposes to write on/read from disk)
df = pd.DataFrame.from_dict({'a': [1], 'b': None})
self.assertTrue(pd_equals(df, TEST_FILE) is None)

jsonify_series

As for pd_equals, converting a pandas serie to something that is json acceptable can be useful.

Simple example

# test jsonify_series (subtlety of None/np.Nan)
df = pd.DataFrame.from_dict({'a': [1, 2], 'b': [np.nan, 2]})
self.assertDictEqual(jsonify_series(df['b']), {0: None, 1: 2.0})