Data API
libpyvinyl
provides several abstract classes to create data interfaces.
DataCollection
DataCollection
is a thin layer interface between the Calculator and DataClass. It aggregates
the input and output into a single variable, respectively.
A DataCollection
can be initialized with several DataClass instances like this:
collection = DataCollection(data_1, data_2, ..., data_n)
or with the add_data()
:
collection = DataCollection()
collection.add_data(data_1, data_2, ..., data_n)
A data can be accessed by its key:
data_1 = collection["data_1_key"]
A list of data dictionaries of the data in a DataCollection
can be obtained:
collection.get_data()
You can also create a list of the Data objects in the DataCollection
collection.to_list()
To get an overview of the DataCollection
, just print it out:
print(collection)
BaseData
A specialized Data class can be created for a kind of data with similar attributes
based on the abstract BaseData
class. The abstract class provides useful helper
functions and a template for the Data interface.
A file-mapping DataClass will not read the file until the final user calls get_data()
, which
calls the read()
method of its file_format_class
and returns
the python dictionary of the data. The file_format_class
is defined by one of these functions:
To create/set a DataClass as a python dictionary mapping:
from_dict()
: Create a class instance mapping from a python dictionary.set_dict()
: Set the class as a python dictionary mapping.
To create/set a DataClass as a file mapping:
set_file()
: Set the class as a file mapping.from_file()
: Create a class instance mapping from a file.
To write the Data class into a file in a certain file format you can:
data_file = data.write(filename = 'test_file', format_class=FormatClass)
The file can then be written into a test_file
, with the FormatClass you specify.
To list the formats supported by the Data Class:
list_formats()
: This method prints the return ofsupported_formats()
, which needs to be defined for the derived class.
Develop a derived DataClass
A DataClass derived from the BaseData
class only needs two pieces of information:
expected_data
: a dictionary whose key defines the data needed.supported_formats()
, it returns a dictionary describing the supported formats. The information is extracted from the format class with the_add_ioformat()
method. An example:
class NumberData(BaseData):
def __init__(
self,
key,
data_dict=None,
filename=None,
file_format_class=None,
file_format_kwargs=None,
):
expected_data = {}
### DataClass developer's job start
expected_data["number"] = None
### DataClass developer's job end
super().__init__(
key,
expected_data,
data_dict,
filename,
file_format_class,
file_format_kwargs,
)
@classmethod
def supported_formats(self):
format_dict = {}
### DataClass developer's job start
self._add_ioformat(format_dict, TXTFormat.TXTFormat)
self._add_ioformat(format_dict, H5Format.H5Format)
### DataClass developer's job end
return format_dict
BaseFormat
The Format class is the interface between the exact file and the python object.
For each derived FormatClass, we have to provide the content of:
format_register()
: to provide the meta data of this format.read()
: how do we read the file into a python dictionary, whose keys must include the keys of theexpected_data
of the DataClass connecting to this format.write()
: how do we write the data of the DataClass into a file in this format.
Optionally, a direct convert method can be defined to avoid reading the whole data into the memory. See:
BaseFormat.direct_convert_formats()
BaseFormat.convert()
read() and write()
The read()
method needs to return a python dictionary required by its corresponding
Data Class. Example:
class NumberData(BaseData):
...
expected_data = {}
### DataClass developer's job start
expected_data["number"] = None
...
class TXTFormat(BaseFormat):
...
@classmethod
def read(cls, filename: str) -> dict:
"""Read the data from the file with the `filename` to a dictionary. The dictionary will
be used by its corresponding data class."""
number = float(np.loadtxt(filename))
data_dict = {"number": number}
return data_dict
...
The write()
method should call object.get_data()
, where the object
is an instance of the FormatClass’s corresponding
DataClass, and write the data to the intended file. It is recommended to return a DataClass object mapping to the newly written
file.
class TXTFormat(BaseFormat):
...
@classmethod
def write(cls, object: NumberData, filename: str, key: str = None):
"""Save the data with the `filename`."""
data_dict = object.get_data()
arr = np.array([data_dict["number"]])
np.savetxt(filename, arr, fmt="%.3f")
if key is None:
original_key = object.key
key = original_key + "_to_TXTFormat"
return object.from_file(filename, cls, key)
...
Example of a FormatClass:
class TXTFormat(BaseFormat):
def __init__(self) -> None:
super().__init__()
@classmethod
def format_register(self):
key = "TXT"
desciption = "TXT format for NumberData"
file_extension = ".txt"
read_kwargs = [""]
write_kwargs = [""]
return self._create_format_register(
key, desciption, file_extension, read_kwargs, write_kwargs
)
@staticmethod
def direct_convert_formats():
# Assume the format can be converted directly to the formats supported by these classes:
# AFormat, BFormat
# Redefine this `direct_convert_formats` for a concrete format class
return []
@classmethod
def read(cls, filename: str) -> dict:
"""Read the data from the file with the `filename` to a dictionary. The dictionary will
be used by its corresponding data class."""
number = float(np.loadtxt(filename))
data_dict = {"number": number}
return data_dict
@classmethod
def write(cls, object: NumberData, filename: str, key: str = None):
"""Save the data with the `filename`."""
data_dict = object.get_data()
arr = np.array([data_dict["number"]])
np.savetxt(filename, arr, fmt="%.3f")
if key is None:
original_key = object.key
key = original_key + "_to_TXTFormat"
return object.from_file(filename, cls, key)