[web] [lect]

Probability distributions

Thermodynamic systems typically contain a large number of particles. State variables
describe the behaviour of a system as a whole without considering the properties of individual particles or the interactions between them. There are both practical and fundamental reasons that we cannot measure the state of a system by determining each particle's state individually and averaging (for intensive variables) or adding them all up (for extensive ones): there are too many particles, and quantum mechanics tells us that many observables cannot be measured simultaneously with precision. It is therefore appropriate to use statistics
to analyse the behaviour of many-particle systems and determine their state variables.

It is useful to consider a game of dice in order to understand the use of statistics when dealing with large systems. The average total score from any throw of any number of dice is always 3.5 times the number of dice, but the probability distribution
of scores changes drastically as the number of dice thrown increases. For a single dice, all six possible values have the same probability. If two dice are thrown, the scores range from 2 to 12, but these extreme values are less likely than the average value of 7. The reason is that there is only one combination that realises a score of 2 but six different combinations to achieve a 7. This assumes that we can distinguish the two dice (e.g. by their colour) and can therefore tell e.g. from . The probability distribution for two dice has a triangular shape.

For three dice, working out the possible combinations begins to become cumbersome - never mind a mole of dice! The distribution is a binomial distribution and gets progressively narrower as the number of dice increases. The Fig. shows computer-simulated distributions for a million throws of 1, 2, 3, 10, 60 and 600 dice, normalised vertically and horizontally to the probability of the peak of the function and the average value, respectively. From about ten dice, the curve resembles very closely a Gaussian distribution,
characterised by its mean, $x_0$, and standard deviation, $\sigma$: $$f(x)=\frac{1}{\sqrt{2\sigma^2\pi}}\exp{-\frac{(x-x_0)^2}{2\sigma^2}}\qquad.$$

Boltzmann distribution

In thermodynamic terms, the dice are replaced by particles, the score each dice shows with the energy of a particle, and the distribution of total scores with the distribution of particles over available energy levels. We define as a microstate
a particular distribution of a set of particles across the energy levels. In dice terms, both and would be different microstates of our "system" of dice. A macrostate
on the other hand is the total score of all the dice or distribution of particles across energy levels without consideration of which particle is in which level. The Fig. shows the ten possible microstates of three particles distributed across four equidistant energy levels while maintaining a constant total energy. The macrostate on the left consists of three microstates - in each case, one of the particles is in the highest and the other two in the lowest level. The three microstates arise from assigning the higher energy to a different particle. The second macrostate has six contributing microstates: the energy of the highest-energy particle is dropped by one notch, and one of the other particles is lifted up by one level, resulting in three occupied levels and therefore six combinations. Finally, the third macrostate, where all three particles are in the second-lowest level, has only one microstate since swapping particles in the same level makes no difference - the equivalent of a throw.

The underlying assumption in this concept is that all microstates individually are equally probable
- a is as likely as a or any other combination of the two coloured dice. The extreme situation where one particle contains the whole energy of the system is as likely as any one of the many combinations of the particles across the energy levels.

Because there are more different microstates contributing to the macrostates near the peak of the probability distribution, we can define the statistical weight, $\Omega$,
as the number of microstates contributing to a macrostate. For distinguishable, non-interacting particles
(or coloured dice), the statistical weight is given by $$\Omega=\frac{N!}{\prod_rN_r!}\qquad.$$ The numerator counts all the permutations of the $N$ particles
over the $r$ energy levels,
and the denominator removes the duplicate microstates arising from the fact that we can swap even distinguishable particles without making a difference if they are in the same energy level. The product symbol, $\prod_r$, signifies a product running over all energy levels. The statistical weight of the first macrostate shown in the Fig. above is therefore $$\color{grey}{\Omega=\frac{3!}{1!\;0!\;0!\;2!}=\frac{6}{2}=3}\qquad.$$

When distributing the particles over the energy levels, we have to maintain two additional conditions: The number of particles doesn't change since we're not adding or removing material from the system - the system is a closed system: $$N=\sum_{i=0}^{r-1}N_i\qquad.$$ In a closed system, we also do not allow any flows of energy into or out of the system; it is enclosed in adiabatic walls. Therefore, the total energy, $E$,
also remains constant: $$E=\sum_{i=0}^{r-1}N_i\epsilon_i\qquad.$$ As seen above, probability distributions become very narrow once numbers get large. In a macroscopic system, the probability is almost entirely concentrated in the most probable macrostate.
It is therefore safe to assume that the particles of any macroscopic system will be in a distribution which corresponds to one of the many microstates constituting the most probable macrostate. Particles will change energy levels, i.e. the precise microstate of the system will change dynamically, but it is very unlikely that these dynamic processes will ever produce a population pattern that is significantly different from that representing the most probable macrostate.

In order to work out the populations of the different energy levels for the most probable macrostate, we need to find the location of the maximum of the distribution function, i.e. of the statistical weight. Since the numbers are so large, this is difficult. However, since the logarithm of a function peaks at the same place as the function itself, we might as well search for the maximum of $\ln\Omega$, which is much smaller. By taking the logarithm, the fraction turns into a difference, and the product into a sum: $$\ln{\Omega}=\ln{\frac{N!}{\prod_{i=0}^{r-1}N_i!}}=\ln{N!}-\sum_{i=0}^{r-1}\ln{N_i!}\qquad\qquad\color{grey}{\left[\ln{(ab)}=\ln{a}+\ln{b}\right]}\qquad.$$ The Stirling formula
is an approximation that enables us to lose the factorials: $$\ln{x}!\approx x\ln{x}-x$$ as long as $x$ is a large number. Applying it to the logarithm of the statistical weight, we have $$\ln{\Omega}=N\ln{N}\color{red}{\cancel{\color{black}{-N}}}-\sum_{i=0}^{r-1}N_i\ln{N_i}+\color{red}{\cancel{\color{black}{\sum_{i=0}^{r-1}N_i}}}\qquad,$$ where the second and fourth terms cancel out because the sum of the populations of all energy levels is the total number of particles.

To find the peak, we have to find the point at which the function doesn't change if we change any of the populations $N_i$ by an infinitesimal amount ${\rm d}N_i$. In other words, the total differential
of $\ln{\Omega}$ must be zero. Since $N$ is a constant, the first term of $\ln{\Omega}$ doesn't change. Each term of the sum is differentiated individually with respect to 'its' population $N_i$: $${\rm d}\ln{\Omega}=-\sum_i\frac{\partial}{\partial N_i}N_i\ln{N_i}{\rm d}N_i\qquad\qquad \color{grey}{\left[{\rm d}f(x,y)=\frac{\partial f}{\partial x}{\rm d}x+\frac{\partial f}{\partial y}{\rm d}y\right]}$$ The product rule applies, and as the second term evaluates to 1 it can be neglected compared to $\ln{N_i}$: $$\qquad=-\sum_i\left(\frac{\partial N_i}{\partial N_i}\ln{N_i}+N_i\frac{\partial\ln{N_i}}{\partial N_i}\right){\rm d}N_i =-\sum_{i=0}^{r-1}\left(\ln{N_i}+\color{red}{\cancel{\color{black}{N_i\frac{1}{N_i}}}^{\ll}}\right){\rm d}N_i\qquad,$$ leaving $$\qquad\approx -\sum_i\ln{N_i}{\rm d}N_i\overset{!}{=}0\qquad,$$ which has to equal zero at the maximum of the distribution. At the same time, the number of particles and total energy must also be kept constant: $${\rm d}N=\sum_i{{\rm d}N_i}\overset{!}{=}0$$ $${\rm d}E=\sum_i{\epsilon_i{\rm d}N_i}\overset{!}{=}0$$ The three equations can be solved simultaneously using the method of the Lagrange multipliers.
This is a method frequently used in constrained optimisation problems such as non-linear curve fitting subject to constraints. The idea is that the constraints ($N$ and $E$) are added to the main equation ($\ln{\Omega}$) with unknown coefficients (the multipliers). Here, this produces an equation for each term of the sum (i.e. each energy level): $$(-\ln{N_i}+a+b\epsilon_i){\rm d}N_i\overset{!}{=}0\qquad.$$ Ignoring the trivial case ${\rm d}N_i=0$, this requires that $$N_i={\rm e}^{a+b\epsilon_i}\qquad,$$ producing a useful link between the population $N_i$ of a level and its energy $\epsilon_i$. The total number of particles is the sum of all the population numbers: $$N=\sum_iN_i=\sum_i{\rm e}^{a+b\epsilon_i}={\rm e}^a\sum_i{\rm e}^{b\epsilon_i}\qquad,$$ where ${\rm e}^a$ is the same for all levels and can therefore be taken out of the sum. After re-arranging this for ${\rm e}^a$, $${\rm e}^a=\frac{N}{\sum_i{\rm e}^{b\epsilon_i}}\qquad,$$ we can substitute this factor in the equation for the population of an individual level: $$N_i={\rm e}^{a+b\epsilon_i}={\rm e}^a{\rm e}^{b\epsilon_i}=\frac{N{\rm e}^{b\epsilon_i}}{\sum_i{\rm e}^{b\epsilon_i}}\qquad.$$ The constant $b$ must be a reciprocal energy in order for the exponent to be dimensionless. We identify this with the thermal energy, $k_BT$.
This produces the Boltzmann distribution, i.e. the distribution of particles across energy levels in thermal equilibrium:

$$\textbf{Boltzmann distribution:}\qquad N_i=\frac{N\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}}{\sum_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}} \qquad\qquad\textbf{partition function:}\qquad z=\sum_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}$$

The sum in the denominator is known as the partition function. Provided we have either a model (typically from quantum mechanics) that predicts the energy levels or experimental data (typically spectroscopic data) that determines them, we can calculate the equilibrium populations of the levels using the partition function and the Boltzmann distribution.

State variables and the partition function

Once the partition function of a system is known, its state variables can be calculated without the need to deal with the individual energy levels. As an example, the internal energy, $U$,
of a system can be calculated directly from the partition function. The internal energy is the sum of the energies of all energy levels of the system, weighted by their populations. The populations can be taken from the Boltzmann distribution: $$U=\sum_iN_i\epsilon_i=\frac{N\sum_i\epsilon_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}}{\sum_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}}\qquad.$$ By differentiating the partition function with respect to temperature, $$\frac{{\rm d}z}{{\rm d}T} =\frac{{\rm d}}{{\rm d}T}\sum_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)} =\sum_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}\left(-\frac{\epsilon_i}{k_B}\right)\left(-\frac{1}{T^2}\right) =\sum_i\frac{\epsilon_i\exp{\left(-\frac{\epsilon_i}{k_BT}\right)}}{k_BT^2}\qquad,$$ we can see that the sum in the numerator of the internal energy is linked to $\frac{{\rm d}z}{{\rm d}T}$, while the sum in the denominator is the partition function itself: $$U=\frac{Nk_BT^2\frac{{\rm d}z}{{\rm d}T}}{z}=Nk_BT^2\frac{{\rm d}\ln{z}}{{\rm d}T} \qquad\qquad\color{grey}{\left[\frac{{\rm d}}{{\rm d}x}\ln{x}=\frac{1}{x}\Leftrightarrow\frac{1}{x}{\rm d}x={\rm d}\ln{x}\right]}\qquad.$$ Therefore, once the shape of the function $z$ is known, we can calculate the internal energy. This gives access to the other state variables via the Maxwell relations and the other thermodynamic relationships already introduced.

Statistical entropy

Statistically speaking, a system is in equilibrium
if the distribution of its particles across energy levels conforms with the most probable macrostate.
Therefore, if the statistical weight
of the macrostate of a system is smaller than that of the most probable macrostate, particles will change levels until the system equilibrates towards the most probable configuration. Therefore, the statistical weight has the same function in statistical thermodynamics that entropy
has in classical thermodynamics: it determines the direction of processes. There are a few mathematical differences though: While entropies are additive, statistical weights are multiplicative: $$S=S_1+S_2\qquad\textrm{but}\qquad\Omega=\Omega_1\Omega_2\qquad.$$ Since the logarithm of a product is the same as the sum of the logarithms of its factors, $$\color{grey}{\ln{xy}=\ln{x}+\ln{y}}\qquad,$$ entropy can be identified with the logarithm of the statistical weight. Boltzmann's constant is the constant of proportionality and provides the correct units, J/(mol K):

$$\textbf{Statistical entropy:}\qquad S=k_B\ln{\Omega}$$

So far, we have assumed that we can distinguish different particles from one another. Classically, this is appropriate (although it may be cumbersome to do so given the large numbers involved). However, in quantum systems, this assumption isn't always applicable. The quantum statistics deal with that.