Assume the following scenario: you have done a parameter study and agglomerated the results in a pandas dataframe. You want to save this dataset along with some additional information, e.g. the commit ID of your code which was used to obtain the results. In this case the commit ID is part of the metadata you want to save along with your data. A very simple and basic solution for this task is to use the CSV file format in combination with comments.
When you read a dataframe from a CSV file you can specify the optional argument
comment=String. If a line starts with String, pandas treats it as a comment line and will ignore it. This gives you the option to write additional information in the same file as your data. However, the file format remains just plain text rather than something more complex. Below the steps / code snippets for writing and reading are given.
I opted for the metadata appearing before the actual data since the actual data potentially consists of a lot more lines. In this case the first step is to write the metadata to a file:
def prepend_metadata(self, file_name): """ Prepend metadata of the dataframe to the output file. Uses '#' as comment indicating character. Currently, only the number of columns containing the multiindex is written as metadata. """ dataframe_file = open(file_name, 'w') n_index_columns = str(len(self.dataframe.index.names)) metadata = "# n_index_columns : " + n_index_columns + '\n' dataframe_file.write(metadata) dataframe_file.close()
This is actually a member function of a class which has a dataframe. The result of this function is a file called
file_name which contains the line “
# n_index_columns : 4 \n”.
The actual data needs to be appended to the file
file_name. This is achieved by giving the mode-option to the
to_csv(...) member function:
mode='a' instructs the csv writer to operate in append mode so your metadata is not overwritten.
This is done by giving the
my_dataframe = pandas.dataframe.read_csv(file_name, comment='#')
Of course, this reads only the dataframe itself. If you want to read the metadata, you need your own functions for reading them.
If you write dataframes with multiindices , this is a convenient way to store the number of columns which are part of the multiindex.