Saturday, November 22, 2014

Computing a solvent accessible surface area using Jmol

Someone asked me how to compute a surface area of a molecule.  Here's how to compute a solvent accessible surface area using Jmol.

1. Load your molecule into Jmol (File > Open)

2. Open the Script Editor and Output Console (File > Script Editor, etc)

3. Type the following three lines into the Script Editor window and hit run

set solventproberadius 1.2
isosurface solvent
isosurface area set 0

1.2 Å is the probe radius for water solvent.  If you are using another solvent input another value.  It will correspond roughly to the radius of the solvent molecule.

4. The are will be printed out in the Console.  The units are not specified but I'm sure it's Å2 (square Ångstrom)

Questions about the book?

I have a sneaking suspicion that at least one of you have questions about somethings that I wrote in the book.  I have therefore created this page where you can ask away and everyone can see the answers.

Wednesday, November 19, 2014

Introducing the Computational Chemistry Daily

Yesterday I started the Computational Chemistry Daily which aggregates posts related to computational chemistry on social media such as Twitter, Google+ and Facebook on a daily basis.  It identifies the posts by certain words or hashtags such as #compchem and everyone can contribute. Hope you find it useful.

This work is licensed under a Creative Commons Attribution 4.0

Sunday, November 9, 2014

Calculation of Solvation Free Energies of Charged Solutes Using Mixed Cluster/Continuum Models

This post takes it title from this paper by Bryantsev et al., which provides an excellent discussion of the issue.  The point of this post is to suggest four modifications to the one of the approaches discussed in the paper:

1. The use of standard Helmholz free energies instead of standard Gibbs free energies

2. Correction for indistinguishability of water molecules

3. Parameterization of a water oxygen solvent radius

4. The general use of the cluster model

Continuum solvation models do not account for strong and specific interactions of solvent molecules with the solute (notably ions) so one must include some explicit solvent molecules in the continuum calculations.  The question is how many explicit solvent molecules to use.

Pliego and co-workers suggested that the optimum number of water molecules is the one for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$, computed using Scheme 1, is most negative (refer to the paper for a definition of terms).

Taken from 10.1021/jp802665d. (c) American Chemical Society

However, Bryantsev et al. argue that $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ should be computed according to Scheme 2 and that computed in this way, $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ will converge smoothly to the best value.  Thus, the optimum number of explicit solvent molecules is that for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ changes very little when you add another one.

In principle, Scheme 1 and 2 should yield the same $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ but Bryantsev et al. argue that Scheme 2 provides better cancellation of error compared to Scheme 1. The authors demonstrate this by point showing that the computed free energy of water cluster formation in bulk water, computed using Scheme 3, deviates significantly from 0 with increasing cluster size.

Taken from 10.1021/jp802665d. (c) American Chemical Society

In Figure 1 I show a plot of the residual error ($RE$, large dots), 

$RE=-n\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) + \Delta G^\circ_{\text{g,bind}}-(n-1)\Delta G^{\circ-*}$
$ + \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)-RT\ln (n[\text{H}_2\text{O}]^{n-1})$

computed using two different continuum solvation models (SM6 red and COSMO blue) for increasing number of $n$.

Figure 1. Plot of RE without (large dots) and with (small dots) the Helmholz free energy and correction for indistinguishability of water molecules

Next I'll suggest some changes that significantly reduce $RE$. 

1. The use of standard Helmholz free energies instead of standard Gibbs free energies
The change in Helmholz free energy ($\Delta A^\circ$) is related to the change in Gibbs free energy ($\Delta G^\circ$) by

$\Delta G^\circ=\Delta A^\circ+p^\circ \Delta V$

In the gas phase $p^\circ \Delta V = \Delta n RT$ where $\Delta n$ is the difference between the number of product and reactant molecules.  In solution $\Delta V$ is hard to estimate, but the best approximation is $\Delta V \approx 0$ and $\Delta G^\circ \approx \Delta A^\circ$.  So, I suggest using 

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT$

2. Correction for indistinguishability of water molecules
Let's consider formation of the water dimer in solution, i.e. Scheme 3 for $n=2$.  As I have written about earlier, there are two ways of making the water dimer: one in which molecule $A$ is the H-donor and one where molecule $B$ is the H-donor (I assume the gas phase Hessian calculation for the water molecule is done in $C_{2v}$ symmetry).  In general, a (H$_2$O)$_n$ cluster can be made $n!$ ways so I suggest using 

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT-RT\ln(n!)$ 

With these corrections $RE$ vs $n$ is significantly reduced as shown in Figure 1 (small dots). For example, for $n=18$ and COSMO $RE$ has been reduced from 41 to 10 kcal/mol.  The remaining error comes from errors in the solvation energies, hydrogen bond strengths, anharmonicity, and conformational entropy.  The $n=18$ cluster has around 30 hydrogen bonds to the 10 kcal/mol error could easily be explained by a 0.3 kcal/mol error in the computed hydrogen bond strength. 

3. Parameterization of a water oxygen solvent radius
The solvation energies are clearly another source of error.  For example, the COSMO value for $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is 0.35 kcal/mol lower than the experimental value, corresponding to an error of 6.3 kcal/mol for 18 water molecules.  Part of that error is probably cancelled by similar errors in $\Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)$ but not all.  

In fact, the higher errors for SM6 must be due to its underestimating $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ by 2.51 kcal/mol (corresponding to ca 45 kcal/mol in the gas phase for $n=18$!).  Thus if SM6 has to be used it is important to re-parameterize it so that $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is reproduced reasonably well.  The easiest way is probably to change the radius of the oxygen water molecule sphere.

Notice that results are not improved simply by using the experimental value of $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ instead of the computed one.  For example, for $n=18$ the difference in $RE$ for SM6 and COSMO (where the error in the solvation energy is quite small) is "only" ca 20 kcal/mol so introducing a 45 kcal/mol correction will not improve things for the right reasons. 

Changes in Scheme 1 and 2
So, I suggest the following corrections to Scheme 1 and 2, respectively

$\Delta A^\circ_{\text{g,bind}}(I)=\Delta G^\circ_{\text{g,bind}}(I)-nRT-RT\ln(n!)$ 

$\Delta A^\circ_{\text{g,bind}}(II)=\Delta G^\circ_{\text{g,bind}}(II)-RT$ 

Notice that there is a $RT\ln(n!)$ term for both $(\text{H}_2\text{O})_n$ and $[\text{A}(\text{H}_2\text{O})_n]^{m\pm}$ leading to cancellation of this term for Scheme 2.

Clearly the corrections are much more important for Scheme 1 and Scheme 2 is still the more accurate approach due to better cancellation of errors.

4. The general use of the cluster model
While Bryantsev et al. advocate the use of the cluster cycle to find the optimum cluster size for the ions they still seem to advocate the use of a monomer cycle for pKa predictions (Scheme 4).

Taken from 10.1021/jp802665d. (c) American Chemical Society

I think better cancellation would be achieved by using two water clusters of size $n$ and $m$ instead of $n+m$ water monomers.  Thus

$\Delta G^\circ_{\text{g,deprot}}-(n+m-1)\Delta G^{\circ-*} \rightarrow \Delta A^\circ_{\text{g,deprot}} - \Delta G^{\circ-*}$

$\Delta A^\circ_{\text{g,deprot}}=\Delta G^\circ_{\text{g,deprot}}-RT$ 

$(n+m)\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) \rightarrow \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)+ \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_m)$

$(n+m)RT\ln([\text{H}_2\text{O}]) \rightarrow 2RT\ln([\text{H}_2\text{O}]/(nm))$

This work is licensed under a Creative Commons Attribution 4.0

Saturday, October 4, 2014

Tight binding DFT

Tight binding DFT (DFTB) is a semi-empirical method with speed and accuracy similar to NDDO-based semiempirical methods such as AM1, PM3, and PM6.  Currently there are three types of DFTB methods called DFT1, DFTB2, and DFTB3. DFTB1 and DFTB2 are sometimes called non-SCC DFTB (non-selfconsistent charge) and SCC-DFTB, respectively.  DFTB3 is generally considered the most accurate for molecules and there are several parameter sets for DFTB2 and DFTB3 for different elements.  Compared to PM6, DFTB has so far been parameterized for relatively few elements.

The closed shell DFTB1 energy is computed from the following equation
$$E^{\text{DFTB1}}=\sum_i^{N/2} \sum_\mu^K \sum_\nu^K 2 C_{\mu i} C_{\nu i} H^0_{\mu\nu}+\sum_A \sum_{B>A} E^{\text{rep}}_{AB}$$
where $C_{\mu i}$ is the molecular orbital coefficient for MO $i$ and basis function $\mu$.
$$H^0_{\mu\nu}= \begin{cases}
\varepsilon^{\text{free atom}}_{\mu\mu} & \text{if } \mu=\nu \\
0 & \text{if } A=B, \mu\ne\nu\\
\langle \chi_\mu | \hat{T}+V_{\text{eff}}[\rho_0^A+\rho_0^B] | \chi_\nu \rangle & \text{if } A\ne B
Here,  $\varepsilon^{\text{free atom}}_{\mu\mu}$ is an orbital energy of a free atom, $\chi$ is a valence Slater-type orbital (STO) or numerical orbital, $\hat{T}$ is the electronic kinetic energy operator, $V_{\text{eff}}$ is the Kohn-Sham potential (electron-nuclear attraction, electron-electron repulsion, and exchange correlation), and $\rho_0^A$ is the electron density of neutral atom $A$.

DFT calculations on free atoms using some functional yield $\left\{\varepsilon^{\text{free atom}}_{\mu\mu} \right\}$, $\left\{\chi\right\}$, and $\rho_0$, which are then used to compute $H^0_{\mu\nu}$ for A-B atom pairs at various separations $R_{AB}$ and stored. When performing DFTB calculations $H^0_{\mu\nu}$ is simply computed for each atom pair A-B by interpolation using this precomputed data set.

Similarly, the overlap matrix $\left\{ \langle \chi_\mu | \chi_\nu \rangle  \right\}$ need to orthonormalize the MOs are computed for various distances and stored for future use.

$E^{\text{rep}}_{AB}$ is an empirical repulsive pairwise atom-atom potential with parameters adjusted to minimize the difference in atomization energies, geometries, and vibrational frequencies computed using DFTB and DFT or electronic structure calculations for set of molecules.

So, a DFTB1 calculation is performed by constructing $\mathbf{H}^0$, diagonalizing it to yield $\mathbf{C}$, and then computing $E^{\text{DFTB1}}$.

$$E^{\text{DFTB2}}=E^{\text{DFTB1}}+\sum_A \sum_{B>A} \gamma_{AB}(R_{AB})\Delta q_A\Delta q_B$$
where $\Delta q_A$ is the Mulliken charge on atom $A$ and $\gamma_{AB}$ is a function of $R_{AB}$ that tends to $1/R_{AB}$ at long distances.

The Mulliken charges depend on $\mathbf{C}$ so a selfconsistent calculation is required:
1. Compute DFTB1 MO coefficients, $\mathbf{C}$
2. Use $\mathbf{C}$ to compute $\left\{ \Delta q \right\}$
3. Construct and diagonalize $H_{\mu \nu}$ to get new MO coefficients, $\mathbf{C}$
$$H_{\mu \nu}=H_{\mu \nu}^0 + \frac{1}{2} S_{\mu\nu} \sum_C (\gamma_{AC}+\gamma_{BC})\Delta q_C, \mu \in A, \nu \in B$$
4. Repeat steps 2 and 3 until selfconsistency.

$$E^{\text{DFTB3}}=E^{\text{DFTB2}}+\sum_A \sum_{B>A} \sum_{C>B>A} \Gamma_{AB}\Delta q_A^2\Delta q_B$$
$\Gamma_{AB}$ is computed using interpolation using precomputed data. A SCF calculation is required.

Parameter sets and availability
DFTB is available in a variety of software packages.  I don't believe DFTB3 is currently in Gaussian and DFTB is also available in CHARMM and CP2K.  DFTB will soon be available in GAMESS.

Note that each user/lab must download the parameter file separately here.  There are several parameter sets.  The most popular sets for molecules are the MIO (materials and biological systems) for DFTB2 and 3OB (DFT3 organic and biological applications).

Dispersion and hydrogen bond corrections
Just like DFT and PM6, the DFTB can be corrected for dispersion and hydrogen bond effect.

This work is licensed under a Creative Commons Attribution 4.0