Skip to content

Replace the pyMBE Pandas dataframe for a canonical pyMBE database#147

Open
pm-blanco wants to merge 55 commits intopyMBE-dev:mainfrom
pm-blanco:change_storage
Open

Replace the pyMBE Pandas dataframe for a canonical pyMBE database#147
pm-blanco wants to merge 55 commits intopyMBE-dev:mainfrom
pm-blanco:change_storage

Conversation

@pm-blanco
Copy link
Collaborator

@pm-blanco pm-blanco commented Feb 10, 2026

Fixes #146, #123 and #18 :

Added

  • Introduced a canonical pyMBE database backend replacing the previous monolithic Pandas DataFrame storage approach. This lays the foundation for more robust, extensible, and normalized data handling across pyMBE.
  • Added support to define reaction templates in the pyMBE database.
  • Utility functions to cast information about templates and instances in the pyMBE database into pandas dataframe pmb.get_templates_df, pmb.get_instances_df and pmb.get_reactions_df.
  • Utility functions to load and save the new database via the pyMBE API, pmb.save_database and pmb.load_database.
  • Added functions to define particle states: pmb.define_particle_states and pmb.define_monoprototic_particle_states.
  • Added utility functions in lib/handy_functions to define residue and particle templates for aminoacids en peptides .and residues: define_protein_AA_particles, define_protein_AA_residues and define_peptide_AA_residues.

Changed

  • Refactored core modules to use the new database schema based on templates and instances for particles, residues, molecules, hydrogels, proteins and peptides.
  • Particle states now are independent templates and are now disentangled from particle templates.
  • Pka values are now stored as part of chemical reactions and no longer an attribute of particle templates.
  • Amino acid residue templates are no longer defined internally in define_peptide and define_protein. Those definitions are now exposed to the user.
  • Molecule templates now need to be defined to be used as templates for hydrogel chains in hydrogels.

Fixed

  • Utility methods get_particle_id_map, calculate_HH, calculate_net_charge, center_object_in_simulation_box now support all template types in pyMBE, including hydrogels. Some of these methods have been renamed to expose directly in the API this change in behavior.

Removed

  • Method add_bonds_to_espresso has been removed from the API. pyMBE now adds bonds internally to ESPResSo when molecule instances are created into ESPResSo.
  • Tutorial lattice_builder.ipynb has been removed because its content is redundant with sample script build_hydrogel.py.

Documentation

  • Cleaned and standardized documentation.

Tests

  • Modernized molecule tests and LJ tests.

@pm-blanco pm-blanco self-assigned this Feb 10, 2026
@pm-blanco pm-blanco added bug Something isn't working documentation Improvements or additions to documentation enhancement New feature or request code quality labels Feb 10, 2026
@pm-blanco pm-blanco added this to the pyMBEv2.0.0 milestone Feb 10, 2026
@pm-blanco pm-blanco requested a review from 1234somesh February 10, 2026 13:10
Notes:
- Values of the parameters are stored as PintQuantity objects for unit-aware calculations.
"""
pmb_type: Literal["bond"] = "bond"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why do instances and templates have different level of protection for the argument pmb_type? i.e. Literal["bond"] = "bond" vs pmb_type: str = "bond". I think one can use same level of protection in all instances and templates for "pmb_type" argument

pmb_type: Literal["bond"] = "bond"
name: str = Field(default="default")
bond_type: str # "HARMONIC", "FENE"
particle_name1: str | None = None
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

May be its a good idea to unify Optional[str] vs str | None used interchangeably with ParticleTemplate

Unique non-negative integer identifying this molecule instance within the database.

assembly_id (int | None):
Identifier of the super-parent assembly (e.g. hydrogel) to which this residue belongs. ``None`` indicates that the residue is not assigned to any assembly.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The docstring for assembly_id should refer to a molecule and not to a residue.

for _, tpl in self.db._templates["particle"].items():
radius = (tpl.sigma.to_quantity(self.units) + tpl.offset.to_quantity(self.units))/2.0
if dimensionless:
radius = radius.magnitude
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is ambiguous what magnitude does this line return. This will return the magnitude of the radius in the last unit Pint has stored in buffer memory which makes the behaviour of this method non-deterministic. I suggeted to change this line for radius = radius.m_as('reduced_length') and the variable name dimensionless for magnitude_reduced_length and fix the docs accordingly.

if not used_ids:
return 0
else:
if pmb_type not in self._instances or len(self._instances[pmb_type]) == 0:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

len(self._instances[pmb_type]) == 0 Here second check might be redundant (delete instance already takes care of empty keys in ._instances)
return 0

The state of the particle (e.g., protonation state, charge state).
coefficient (int):
Stoichiometric coefficient of the participant:
- ``coefficient < 0`` → reactant
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"coefficient ∈ ℤ⁻ → reactant" or "coefficient ∈ {−1, −2, −3, …} → reactant"

coefficient (int):
Stoichiometric coefficient of the participant:
- ``coefficient < 0`` → reactant
- ``coefficient > 0`` → product
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"coefficient ∈ ℤ⁺ → product" or "coefficient ∈ {1, 2, 3, …} → product"

Reaction equilibrium parameter (e.g., pKa, log K). The meaning
depends on ``reaction_type``.

reaction_type ('str'):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Place before "pK"

Must include at least two participants.

pK ('float'):
Reaction equilibrium parameter (e.g., pKa, log K). The meaning
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

-log K

"""
participants: List[ReactionParticipant]
pK: float
reaction_type: str
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

place before pK

simulation_method: Optional[str] = None
name: Optional[str] = None

@validator("participants")
Copy link
Contributor

@pinedaps pinedaps Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

replace the variable ¨v¨ for ¨participants¨ in all the @validator("participants")

particle_name (str):
The name of the particle template participating in the reaction.
state_name (str):
The state of the particle (e.g., protonation state, charge state).
Copy link
Contributor

@pinedaps pinedaps Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The name of the particle state

If the coefficient is zero.
"""
if coefficient == 0:
raise ValueError("Stoichiometric coefficient cannot be zero.")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this line exists, is it still necessary the validator ¨def no_zero_coeff(cls, v)¨

name: Optional[str] = None

@validator("participants")
def at_least_two_participants(cls, v):
Copy link
Contributor

@pinedaps pinedaps Mar 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unify the cls/self nomenclature from this point onward.

# Explicitly regenerate name after mutation
self.name = self._generate_name_from_participants()

def _generate_name_from_participants(self):
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method seems to be duplicated, see @root_validator def generate_name(cls, values):

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working code quality documentation Improvements or additions to documentation enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

The pyMBE dataframe is monolithic Utility functions in pyMBE should support hydrogels Save reactions in the pmb_df

3 participants