API Documentation¶

Density fitting¶

A set of utility functions for density fitting. Some Credits to: https://gitlab.com/jmargraf/kdf

class dfa_recommender.df_class.DensityFitting(wfnpath: str, xyzfile: str, basis: str, charge: int = 0, spin: int = 1, wfnpath2: str = 'NA')[source]¶

Bases: object

Density fitting class to project the electron density onto auxiliary basis sets.

calc_powerspec() → array[source]¶

Calculates powerspectrum to yeild a invariant representation from density fitting coefficients

Returns:: powerspec – powerspectrum derived from density fitting coefficients.
Return type:: np.ndarray

calc_utilities() → None[source]¶: Calculate the shell numbers and number of basis functions in each shell.

compensate_charges()[source]¶: Compensate charges in the density fitting. NOTE that currently only work for alpha.

construct_aux() → None[source]¶: Load files, transform them to utility varibles, and check data types

convert_CP2e3nn() → None[source]¶: match m between psi4 and e3nn convension within the same l. For example, for l=1, psi4 m: [0, 1, -1] e3nn m: [-1, 0, 1] For example, for l=2, psi4 m: [0, 1, -1, 2, -2] e3nn m: [-2, -1, 0, 1, 2]

get_dab() → None[source]¶: Build dab_P tensor as tensor before contracting to aux coeffiecients (np.ndarray)

get_df_coeffs(D: Matrix) → None[source]¶: Calculate the raw density fitting coefficients.

pad_df_coeffs() → None[source]¶: Convert self.C_P (a 1D array) to self.self.C_P_pad (N_atoms x M), where M corresponds to the largest dim of coeffs of all atoms. For example, H2O at def2-universal-jkfit basis has 113 coeffs. H -> [0, 0, 1, 1, 2, 2] -> 18 coeffs O -> [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 3, 3, 4] -> 77 coeffs Then self.self.C_P_pad is a (3, 77) np.array with zero-padding at corresponding irreps

property wfnpath: None¶

property wfnpath2: None¶

property xyzfile: None¶

dfa_recommender.df_class.get_molecule(xyzfile: str, charge: int, spin: int, sym: str = 'c1') → Tuple[Molecule, list][source]¶

Assemble a molecule object from xyzfile, charge and spin.

Parameters:

xyzfile (str,) – path to the xyz file of the input molecule.
charge (int,) – charge of the input molecule.
spin (int,) – spin multiplicity (2*S + 1) for the input molecule
sym (str, Optional, default: c1) – point group symmetry of the input molecule

Returns:

mol (psi4.geometry object) – psi4.geometry object for the input molecule
symbols (list) – list of atom symbols

dfa_recommender.df_utils.get_spectra(densfit: DensityFitting, fock: bool = False, H: bool = False, t: str = 'alpha') → array[source]¶

Compute the final power spectrum for the DensityFitting object

Parameters:

densfit (DensityFitting object,) – created from .xyz, .wfn., and basis set
fock (bool, Optional, default: False) – Fock fitting or not
H (bool, Optional, default: False) – Hamiltonian (Potential + Kinetics) fitting or not
t (str, Optional, default: alpha) – alpha or beta spin orbitals
Returns
——–
powerspec (np.ndarray) – powerspectrum derived from density fitting coefficients

dfa_recommender.df_utils.get_subtracted_spectra(densfit: DensityFitting, fock=False, t='alpha') → array[source]¶

Compute the final power spectrum (w.r.t. a second .wfn file) for the DensityFitting object

Parameters:

densfit (DensityFitting object,) – created from .xyz, .wfn., and basis set
fock (bool, Optional, default: False) – Fock fitting or not
H (bool, Optional, default: False) – Hamiltonian (Potential + Kinetics) fitting or not
t (str, Optional, default: alpha) – alpha or beta spin orbitals
Returns
——–
powerspec (np.ndarray) – powerspectrum derived from density fitting coefficients

Behler-Parrinello type gated networks with density fitting features¶

Gated network for energy prediction.

class dfa_recommender.net.ElementalGate(elements, n_out, onehot=True, trainable=False)[source]¶

Bases: Module

Element based masking. Produces a Nbatch x Natoms x Nelem mask depending on the nuclear charges passed as an argument. The purpose is to create element-wise activate based on the block-wise weights in self.gate If onehot is set, mask is one-hot mask, else a random embedding is used. If the trainable flag is set to true, the gate values can be adapted during training. It is recommended to create a mapping dictionary for your elements. For example: mapping = {“X”: 0, “H”: 1, “C”: 2, “N”: 3, “O”: 4, “F”: 5}

forward(inputs: Tensor) → Tensor[source]¶

Compute output.

Parameters:

inputs (torch.Tensor,) – model input as atomic numbers
Returns
——–
outputs (torch.Tensor,) – model output which is unity at the position of the element and zero otherwise.

training: bool¶

class dfa_recommender.net.GatedNetwork(nin: int, n_out: int, elements: list, n_hidden: int = 50, n_layers: int = 3, trainable: bool = False, onehot: bool = True, droprate: float = 0.2)[source]¶

Bases: Module

Behler-Parrinello type gated networks that combines all the building blocks above.

forward(inputs: Tensor, update_batch_stats: bool = True) → Tensor[source]¶

Compute output.

Parameters:

inputs (torch.Tensor,) – model inputs, [batch_size, max(natoms), :-1] are the molecule features, [batch_size, max(natoms), -1] encode the element type.
update_batch_stats (bool, Optional, default as True) – used only in batch normalization
Returns
——–
outputs (torch.Tensor,) – model outputs.

training: bool¶

class dfa_recommender.net.MLP(n_in: int, n_out: int, n_hidden: int = 50, n_layers: int = 3, droprate: float = 0.2)[source]¶

Bases: Module

Multiple layer fully connected neural network. Each type of element has a MLP. Same elements share the same MLP (i.e., weight sharing)

forward(inputs: Tensor) → Tensor[source]¶

Compute output.

Parameters:

inputs (torch.Tensor,) – model input.
Returns
——–
outputs (torch.Tensor,) – model output.

training: bool¶

class dfa_recommender.net.MySoftplus(beta: int = 1, threshold: int = 20)[source]¶

Bases: Module

Shifted Softplus such as MySoftplus(0) = 0

beta: int¶

extra_repr() → str[source]¶

Set the extra representation of the module

To print customized extra information, you should re-implement this method in your own modules. Both single-line and multi-line strings are acceptable.

forward(input: Tensor) → Tensor[source]¶

Defines the computation performed at every call.

Should be overridden by all subclasses.

Note

Although the recipe for forward pass needs to be defined within this function, one should call the Module instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.

threshold: int¶

class dfa_recommender.net.TiledMultiLayerNN(n_in: int, n_out: int, n_tiles: int, n_hidden: int = 50, n_layers: int = 3, droprate: float = 0.2)[source]¶

Bases: Module

Tiled multilayer networks. A list of MLPs These MLPs are applied to the input to which the outputs as concatenated. The purpose is to create element-wise prediction. Note that n_tiles should be the same as the number of element types in your data set.

forward(inputs: Tensor) → Tensor[source]¶

Compute output.

Parameters:

inputs (torch.Tensor,) – model input.
Returns
——–
outputs (list,) – model output as list of torch.Tensor

training: bool¶

dfa_recommender.net.call_bn(bn: BatchNorm1d, x: Tensor, update_batch_stats: bool = True) → None[source]¶: Call for batch normalization

class dfa_recommender.net.finalMLP(elements, n_out, droprate=0.2)[source]¶

Bases: Module

The final fully connected neural network that maps the outputs from ElementalGate to the final outputs.

forward(inputs: Tensor, update_batch_stats: bool = True) → Tensor[source]¶

Compute output.

Parameters:

inputs (torch.Tensor,) – model inputs
update_batch_stats (bool, Optional, default as True) – used only in batch normalization
Returns
——–
outputs (torch.Tensor,) – model outputs.

training: bool¶

Virtual adversarial training¶

Virtual adversarial training

class dfa_recommender.vat.VAT(device, eps, xi, alpha, k=1, use_entmin=False)[source]¶

Bases: object

Implementation of virtual adversarial training. See https://arxiv.org/abs/1704.03976 for more details.

dfa_recommender.vat.df_l2_normalize(d, l_x, cut=True)[source]¶

Normalize d with a zero masking.

Parameters:

d (torch.Tensor) – random perturbation in the input space
l_x (torch.Tensor) – a tensor based on which the mask is created
cut (bool, default as True) – whether applying the mask or not

Returns:

dn – normalized random perturbation in the input space

Return type:

torch.Tensor

class dfa_recommender.vat.regVAT(device, eps, xi, alpha, k=1, cut=True)[source]¶

Bases: object

Implementation of virtual adversarial training in a regression task The only difference compared to VAT is the change of KL divergence to MSE in measuring the original and perturbed point.

PyTorch utility functions¶

class dfa_recommender.dataset.SubsetDataset(dataset, indices)[source]¶

Bases: Dataset

Subset a torch.utils.data.Dataset object

dfa_recommender.evaluate.evaluate_regressor(regressor, loader, device, y_scaler)[source]¶

Evaluate the model performance on a single regression task

Parameters:

regressor (torch.nn.Module) – trained regression model
loader (torch.utils.data.DataLoader) – your torch dataloader
device (torch.device) – the device at which this evaluation is performed
y_scaler (sklearn.preprocessing.StandardScaler) – the scaler that you normalize the label of training data

Returns:

mae (float) – MAE
scaled_mae (float) – scaled MAE
rval (float,) – Pearson’s coefficient

Utility functions for preparing datasets, model training and evaluation.

dfa_recommender.ml_utils.numpy_to_dataset(X, y, regression=False)[source]¶

Aseemble numpy arrays to torch tensor data set

Parameters:

X (np.array) – features
y (np.array) – targets
regression (bool, default as False) – whether a regression task or not

Returns:

data – assembled data set

Return type:

torch.utils.data.TensorDataset

class dfa_recommender.sampler.InfiniteSampler(num_samples)[source]¶

Bases: Sampler

Sample datasets

loop()[source]¶