This post takes it title from

this paper by Bryantsev

*et al*., which provides an excellent discussion of the issue. The point of this post is to suggest four modifications to the one of the approaches discussed in the paper:

1. The use of standard Helmholz free energies instead of standard Gibbs free energies

2. Correction for indistinguishability of water molecules

3. Parameterization of a water oxygen solvent radius

4. The general use of the cluster model

**Background**
Continuum solvation models do not account for strong and specific interactions of solvent molecules with the solute (notably ions) so one must include some explicit solvent molecules in the continuum calculations. The question is how many explicit solvent molecules to use.

Pliego and co-workers suggested that the optimum number of water molecules is the one for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$, computed using Scheme 1, is most negative (refer to the paper for a definition of terms).

However, Bryantsev

*et al*. argue that $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ should be computed according to Scheme 2 and that computed in this way, $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ will converge smoothly to the best value. Thus, the optimum number of explicit solvent molecules is that for which $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ changes very little when you add another one.

In principle, Scheme 1 and 2 should yield the same $\Delta G^*_{\text{solv}}(\text{A}^{m\pm})$ but Bryantsev et al. argue that Scheme 2 provides better cancellation of error compared to Scheme 1. The authors demonstrate this by point showing that the computed free energy of water cluster formation in bulk water, computed using Scheme 3, deviates significantly from 0 with increasing cluster size.

In Figure 1 I show a plot of the residual error ($RE$, large dots),

$RE=-n\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) + \Delta G^\circ_{\text{g,bind}}-(n-1)\Delta G^{\circ-*}$

$ + \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)-RT\ln (n[\text{H}_2\text{O}]^{n-1})$

computed using two different continuum solvation models (SM6 red and COSMO blue) for increasing number of $n$.

**Figure 1.** Plot of RE without (large dots) and with (small dots) the Helmholz free energy and correction for indistinguishability of water molecules

Next I'll suggest some changes that significantly reduce $RE$.

**1. The use of standard Helmholz free energies instead of standard Gibbs free energies**

The change in Helmholz free energy ($\Delta A^\circ$) is related to the change in Gibbs free energy ($\Delta G^\circ$) by

$\Delta G^\circ=\Delta A^\circ+p^\circ \Delta V$

In the gas phase $p^\circ \Delta V = \Delta n RT$ where $\Delta n$ is the difference between the number of product and reactant molecules. In solution $\Delta V$ is hard to estimate, but the best approximation is $\Delta V \approx 0$ and $\Delta G^\circ \approx \Delta A^\circ$. So, I suggest using

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT$

**2. Correction for indistinguishability of water molecules**

Let's consider formation of the water dimer in solution, i.e. Scheme 3 for $n=2$.

As I have written about earlier, there are two ways of making the water dimer: one in which molecule $A$ is the H-donor and one where molecule $B$ is the H-donor (I assume the gas phase Hessian calculation for the water molecule is done in $C_{2v}$ symmetry). In general, a (H$_2$O)$_n$ cluster can be made $n!$ ways so I suggest using

$\Delta A^\circ_{\text{g,bind}}=\Delta G^\circ_{\text{g,bind}}-(n-1)RT-RT\ln(n!)$

With these corrections $RE$ vs $n$ is significantly reduced as shown in Figure 1 (small dots). For example, for $n=18$ and COSMO $RE$ has been reduced from 41 to 10 kcal/mol. The remaining error comes from errors in the solvation energies, hydrogen bond strengths, anharmonicity, and conformational entropy. The $n=18$ cluster has around 30 hydrogen bonds to the 10 kcal/mol error could easily be explained by a 0.3 kcal/mol error in the computed hydrogen bond strength.

**3. Parameterization of a water oxygen solvent radius**

The solvation energies are clearly another source of error. For example, the COSMO value for $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is 0.35 kcal/mol lower than the experimental value, corresponding to an error of 6.3 kcal/mol for 18 water molecules. Part of that error is probably cancelled by similar errors in $\Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)$ but not all.

In fact, the higher errors for SM6 must be due to its underestimating $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ by 2.51 kcal/mol (corresponding to ca 45 kcal/mol in the gas phase for $n=18$!). Thus if SM6 has to be used it is important to re-parameterize it so that $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ is reproduced reasonably well. The easiest way is probably to change the radius of the oxygen water molecule sphere.

Notice that results are not improved simply by using the experimental value of $\Delta G^*_{\text{solv}}(\text{H}_2\text{O})$ instead of the computed one. For example, for $n=18$ the difference in $RE$ for SM6 and COSMO (where the error in the solvation energy is quite small) is "only" ca 20 kcal/mol so introducing a 45 kcal/mol correction will not improve things for the right reasons.

**Changes in Scheme 1 and 2**
So, I suggest the following corrections to Scheme 1 and 2, respectively

$\Delta A^\circ_{\text{g,bind}}(I)=\Delta G^\circ_{\text{g,bind}}(I)-nRT-RT\ln(n!)$

$\Delta A^\circ_{\text{g,bind}}(II)=\Delta G^\circ_{\text{g,bind}}(II)-RT$

Notice that there is a $RT\ln(n!)$ term for both $(\text{H}_2\text{O})_n$ and $[\text{A}(\text{H}_2\text{O})_n]^{m\pm}$ leading to cancellation of this term for Scheme 2.

Clearly the corrections are much more important for Scheme 1 and

**Scheme 2 is still the more accurate approach due to better cancellation of errors**.

**4. The general use of the cluster model**
While Bryantsev

*et al*. advocate the use of the cluster cycle to find the optimum cluster size for the ions they still seem to advocate the use of a monomer cycle for pKa predictions (Scheme 4).

I think better cancellation would be achieved by using two water clusters of size $n$ and $m$ instead of $n+m$ water monomers. Thus

$\Delta G^\circ_{\text{g,deprot}}-(n+m-1)\Delta G^{\circ-*} \rightarrow \Delta A^\circ_{\text{g,deprot}} - \Delta G^{\circ-*}$

$\Delta A^\circ_{\text{g,deprot}}=\Delta G^\circ_{\text{g,deprot}}-RT$

$(n+m)\Delta G^*_{\text{solv}}(\text{H}_2\text{O}) \rightarrow \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_n)+ \Delta G^*_{\text{solv}}((\text{H}_2\text{O})_m)$

$(n+m)RT\ln([\text{H}_2\text{O}]) \rightarrow 2RT\ln([\text{H}_2\text{O}]/(nm))$

This work is licensed under a

Creative Commons Attribution 4.0