Replace the pyMBE Pandas dataframe for a canonical pyMBE database by pm-blanco · Pull Request #147 · pyMBE-dev/pyMBE

pm-blanco · 2026-02-10T08:10:59Z

Fixes #146, #123 and #18 :

Added

Introduced a canonical pyMBE database backend replacing the previous monolithic Pandas DataFrame storage approach. This lays the foundation for more robust, extensible, and normalized data handling across pyMBE.
Added support to define reaction templates in the pyMBE database.
Utility functions to cast information about templates and instances in the pyMBE database into pandas dataframe pmb.get_templates_df, pmb.get_instances_df and pmb.get_reactions_df.
Utility functions to load and save the new database via the pyMBE API, pmb.save_database and pmb.load_database.
Added functions to define particle states: pmb.define_particle_states and pmb.define_monoprototic_particle_states.
Added utility functions in lib/handy_functions to define residue and particle templates for aminoacids en peptides .and residues: define_protein_AA_particles, define_protein_AA_residues and define_peptide_AA_residues.

Changed

Refactored core modules to use the new database schema based on templates and instances for particles, residues, molecules, hydrogels, proteins and peptides.
Particle states now are independent templates and are now disentangled from particle templates.
Pka values are now stored as part of chemical reactions and no longer an attribute of particle templates.
Amino acid residue templates are no longer defined internally in define_peptide and define_protein. Those definitions are now exposed to the user.
Molecule templates now need to be defined to be used as templates for hydrogel chains in hydrogels.

Fixed

Utility methods get_particle_id_map, calculate_HH, calculate_net_charge, center_object_in_simulation_box now support all template types in pyMBE, including hydrogels. Some of these methods have been renamed to expose directly in the API this change in behavior.

Removed

Method add_bonds_to_espresso has been removed from the API. pyMBE now adds bonds internally to ESPResSo when molecule instances are created into ESPResSo.
Tutorial lattice_builder.ipynb has been removed because its content is redundant with sample script build_hydrogel.py.

Documentation

Cleaned and standardized documentation.

Tests

Modernized molecule tests and LJ tests.

1234somesh · 2026-03-04T11:18:33Z

pyMBE/storage/templates/bond.py

+    Notes:
+        - Values of the parameters are stored as PintQuantity objects for unit-aware calculations.
+    """
+    pmb_type: Literal["bond"] = "bond"


Why do instances and templates have different level of protection for the argument pmb_type? i.e. Literal["bond"] = "bond" vs pmb_type: str = "bond". I think one can use same level of protection in all instances and templates for "pmb_type" argument

1234somesh · 2026-03-04T11:27:22Z

pyMBE/storage/templates/bond.py

+    pmb_type: Literal["bond"] = "bond"
+    name: str = Field(default="default")
+    bond_type: str                      # "HARMONIC", "FENE"
+    particle_name1: str | None = None


May be its a good idea to unify Optional[str] vs str | None used interchangeably with ParticleTemplate

pinedaps · 2026-03-06T14:07:31Z

pyMBE/storage/instances/molecule.py

+            Unique non-negative integer identifying this molecule instance within the database.
+
+        assembly_id (int | None):
+            Identifier of the super-parent assembly (e.g. hydrogel) to which this residue belongs. ``None`` indicates that the residue is not assigned to any assembly.


The docstring for assembly_id should refer to a molecule and not to a residue.

pinedaps · 2026-03-06T14:10:26Z

pyMBE/pyMBE.py

+        for _, tpl in self.db._templates["particle"].items():
+            radius = (tpl.sigma.to_quantity(self.units) + tpl.offset.to_quantity(self.units))/2.0
+            if dimensionless:
+                radius = radius.magnitude


It is ambiguous what magnitude does this line return. This will return the magnitude of the radius in the last unit Pint has stored in buffer memory which makes the behaviour of this method non-deterministic. I suggeted to change this line for radius = radius.m_as('reduced_length') and the variable name dimensionless for magnitude_reduced_length and fix the docs accordingly.

1234somesh · 2026-03-06T15:20:02Z

pyMBE/storage/manager.py

+            if not used_ids:
+                return 0
+        else:
+            if pmb_type not in self._instances or len(self._instances[pmb_type]) == 0:


len(self._instances[pmb_type]) == 0 Here second check might be redundant (delete instance already takes care of empty keys in ._instances)
return 0

pinedaps · 2026-03-06T15:34:56Z