Article Text

Download PDFPDF

Maximising the global use of HIV surveillance data through the development and sharing of analytical tools
  1. G P Garnett1,
  2. N C Grassly1,
  3. J T Boerma2,
  4. P D Ghys3
  1. 1Department of Infectious Disease Epidemiology, Imperial College London, London, UK
  2. 2World Health Organization, Geneva, Switzerland
  3. 3Joint United Nations Programme on HIV/AIDS, Geneva, Switzerland
  1. Correspondence to:
 Professor G Garnett
 Department of Infectious Disease Epidemiology, Imperial College of Science Technology and Medicine, St Mary’s Hospital, Norfolk Place London, UK;

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

To improve health it is important that care and prevention activities are focused on real problems and real need. Rational decisions about health strategies and interventions should be based on reliable and timely knowledge of the distribution of disease, which is only available with good surveillance. For a disease like AIDS, convincing statistics are necessary in estimating the extent of the spread of HIV and the associated demographic, social, and economic costs. However, there are problems with HIV surveillance: the long latent period means that disease is a reflection of historic rather than current spread; the infection is particularly present where resources for surveillance are limited; and there are biases in who comes forward for testing, be it in anonymous women attending antenatal clinics or those seeking a diagnosis in voluntary counselling and testing. These challenges have led to the development of surveillance methods and the theoretical tools to interpret surveillance data, which are based on an understanding of the problems and use the best available data and models to provide timely and practical information for users.

The devastating costs of the disease and the initial alarm followed by limited spread in many industrial countries conspire to generate scepticism among the public, politicians, and professionals alike about the scale of the HIV pandemic.1 Against such scepticism convincing estimates can have a powerful and timely advocacy effect.2 However, for these estimates to be convincing, they need to have a sound empirical basis and be based on transparent well accepted methods. This is the only antidote against the common tendency to overestimate the spread and consequences of a disease to generate more resources for the response to the epidemic. Although overestimation may have public health benefits in the short run, its long term effects will undoubtedly be counterproductive.

The availability of a reliable, sensitive, specific, safe, and inexpensive test for HIV infection has been crucial to understanding the epidemiology of the virus. The establishment of HIV surveillance systems has informed us of the spread of HIV, in a manner unparalleled for other diseases. In high income countries universal reporting of some diseases and registries have provided rich sources of comprehensive data.3 However, in developing countries limited resources often mean that surveillance is viewed as a luxury and surveillance systems that rely on reporting of AIDS cases have provided little. Many countries have set up expanded surveillance systems that bring together data on prevalence of infection and disease, as well as on risk factors for infection.4 The understanding that is vital for mobilising and directing resources and informing prevention and care programmes must be based on the analysis of local epidemiological data, thereby maximising the investment in surveillance systems.

In this issue a collection of papers describes some of the tools generated by researchers to assist this analysis. A balance has to be struck between scientific rigor and universal applicability. To those versed in a particular specialism the compromises and simplifications necessary may appear unwarranted. However, for a global epidemiological exercise methods need to be readily applicable and understandable in very diverse circumstances. Only as universal methods are adopted and owned by the often over stretched national epidemiologists does their worth become apparent and over time experience and training allow for the development and application of more sophisticated and precise methods.

To maximise the use of HIV surveillance data the Joint United Nations Programme on AIDS (UNAIDS) and the World Health Organization (WHO), in collaboration with researchers from a range of organisations, have co-ordinated the development of universally applicable methods. Between April and September 2003, in a series of 12 regional workshops, 261 national epidemiologists from 127 countries have been trained in the use of the tools appropriate to their level of epidemic. All of the ”tools” described in this issue use mathematical models to analyse epidemiological data. Mathematical models provide a framework for the analysis of data. They can then be used to generate predictions, to test hypotheses, explore indirect consequences, and to create future scenarios. Such scenarios can incorporate proposed interventions and evaluate their potential for benefit or harm. Mathematical models offer a precise way of capturing our assumptions about data and can range from very simple models that attempt to capture the essence of a system to the very complex, which attempt to incorporate all relevant (and often irrelevant) detail.5 It is only by progressing from the simple to the complex in model development that we can hope to understand the changes in model behaviour associated with extra levels of complexity.6 Ideally a model should be suited to its function and include the necessary level of complexity. Model validity then relates to the ability of the model to generate the appropriate answers to the questions posed and can be assessed through the understanding the model generates, the fit of the model to currently available data, and in retrospect whether model projections agree with subsequent observation. However, such model fits to data should be treated with caution.

In fitting a mathematical model to available data we make very important assumptions. Firstly, that the data to which model output is compared provides a good description of the underlying epidemic. Often in surveillance data there are systematic biases that cannot be accounted for in statistical measures of the uncertainty. These systematic biases are particularly worrisome if they change over time. Secondly, the degrees of freedom required to estimate all the relevant parameters in a model often exceed the degrees of freedom in the available data. This is sometimes addressed by restricting the number of parameters estimated through a comparison with outcome data, as is the case with the UNAIDS Estimation and Projection Package (EPP),7,8 where only four parameter values are estimated from sero-prevalence data and others are externally estimated from other sources of data. Even then, the available data on parts of the epidemic curve often make it impossible to estimate with any confidence some of the four parameter values. The reduction in the number of parameters estimated leaves out many variables and their distributions, which we know can play a role in the epidemiology of HIV. In any ”mechanical” application of such simplified models it is extremely important that this simplification is remembered and estimates of parameter values and the model itself are not over interpreted. Particularly worrying is the tendency to assume that because a model provides a good representation of available data it provides a good description of the mechanisms involved and appropriate parameter values. It is perfectly possible for a model to generate a description of what is observed for the wrong reasons. For example, the spread of HIV in the multicentre AIDS cohort in the US was modelled and used as evidence for a much higher transmissibility of the virus in the early stages of infection than later on.9 This was because the epidemic prevalence grew rapidly and then suddenly saturated. However, extreme heterogeneity in risk behaviours and fairly restricted sexual mixing of the high risk population with others could equally have explained such a pattern. It is only by the further investigation of the detailed history of viraemia, sexual activity, and transmission probability within partnerships that we have been able to distinguish the role of these two mechanisms in explaining this observed epidemiological pattern.10,11 It is, therefore, appropriate that the epidemiological tools described in the following papers are modest in their claims to validity and restricted in their use.

None the less, the models presented are widely used and inform national and international decision making processes. They are made freely available and are promoted by international organisations. It is, therefore, important that their assumptions are made transparent and open to scrutiny in the scientific community. The first four papers in this series are aimed at explaining to the user the purposes and strengths and limitations of the tools, and also at providing some details of the models for scrutiny. A summary of each model is shown in the table 1.

Table 1

 Models used in HIV surveillance systems

It is hoped that the models will be viewed in the spirit in which they were developed, to generate a step forward in developing our shared understanding of HIV epidemiology rather than a definitive last word on the subject, and that constructive criticism will lead to future improvements. Also presented in this issue are two papers that deal with the quality of the HIV sero-surveillance data and the quality of the resulting estimates of the number of people living with HIV/AIDS, of new HIV infections, and of people dying of AIDS.


The EPP provides a framework within which HIV surveillance data can be explored, generating a national representation of the HIV epidemic to date.8 The package replaced EPIMODEL, which used a start date for the epidemic and a single prevalence estimate to define a gamma curve representing HIV prevalence.12 The EPP allows the stratification of the national population into groups and allows the appropriate weighting of surveillance data to be user defined, facilitating careful consideration of the epidemiological situation. The underlying model is a simple description of HIV infection epidemiology where the per susceptible incidence of infection is a function of the existing prevalence of infection. The saturation of the epidemic and its subsequent endemic prevalence are possible at a range of levels because only a fraction of the population are assumed to be at risk and because recruitment to this at-risk population is a function of AIDS associated mortality and population level behavioural change. Using an epidemiological model has the advantage of allowing an intuitive understanding of the epidemiology, but the disadvantage is that the parameters in the model could be taken too literally. The model uses a least squares estimate to generate its fit. The package allows the user to manually override the statistical best fit. At worst this could lead to erroneous estimates based on the a priori assumptions of the epidemiologist. However, it is hoped that the user will be cautious and always justify moving away from an epidemic curve based on a fit to the data.

The model has been used for generalised epidemics to generate estimates of the HIV epidemic and is being further developed to improve its robustness, applicability, and capacity.


A similar process of constructing national estimates from the summation of sub-epidemics has been used in situations where overall HIV prevalence is as yet low, although it may be relatively high in groups at higher risk of infection. The workbook approach relies on estimates of the current size of populations exposed to specific risks and the prevalence of infection within these populations.13 Curves are then used to connect these current estimated prevalences to past estimates, and future peak prevalences, based on analogy with other similar risk groups where the epidemic has already peaked. The process forces consideration of the epidemiological situation and high and low estimates of both risk groups’ sizes and prevalences, but is fraught with potential errors due to poor quality data and a lack of comparability across populations. None the less, working within a common transparent framework forces recognition of what data are and are not available and is starting to lead to quantification of errors in estimating the numbers of cases of HIV.


The advocacy and planning message to be derived from estimates of trends in HIV prevalence requires that the demographic consequences of the epidemic be explored. The Spectrum Projection Package developed by The Futures Group International is currently used for this function.14 The package uses standard demographic methods representing an age and sex structured population and incorporates rates of fertility and mortality associated with AIDS after distributing incident HIV infections throughout the population over time in a manner consistent with the patterns of prevalence in national estimates. As with most demographic models, the changes with respect to age and time which actually occur continuously are represented in equations by discrete alterations with steps through both age and time covering a year each step.15 The model relies on the generalisability of estimates of patterns of HIV incidence with respect to age and sex, rates of progression from HIV infection to death, vertical transmission probabilities, and reductions in fertility due to HIV because these have been derived from meta-analyses rather than being locally derived. The demographic impact of HIV is a function of the incidence of HIV, but is exacerbated if infections occur in the youngest and if they influence fertility. Much debate initially centred on whether HIV could turn population growth rates negative,16,17 which it probably can, but only where it reaches the highest observed prevalences.


The rich epidemiological data available for Thailand and Cambodia have allowed the development of the Asian Epidemic Model (AEM), a model that describes the spread of HIV through sub-populations with particular exposures.18 This transmission dynamic model is representative of the many mathematical models that try to describe the contact patterns within the population and then simulate the spread of the virus through these populations.6,19 Such models allow us to explore the sensitivity of the predicted HIV epidemic to different biological and behavioural patterns of risk and also to explore the potential impact of interventions. The AEM is a deterministic model with population stratified into the key risk groups in southeast Asia and it compares predicted sub-epidemics with those observed. The model is calibrated to match observed trends either through manually or automatically estimating the transmission probabilities between the groups. A contact in the model could be defined as a single sexual act or needle sharing event that would be appropriate for commercial sex or a sexual partnership, where multiple sexual acts take place. Because the risk of transmission per contact is estimated separately for each of the types of contact between groups these different definitions can be represented, but care needs to be taken in choosing how interventions such as condom use fit with the particular ”type” of contact.

The development and use of such mathematical models requires flexibility to be transported to new contexts because a number of epidemiological interactions are possible; a great deal of data with which either to estimate parameter values or compare model outputs; and much caution in interpreting the suggested impact of interventions. The rate of growth and eventual size of infectious disease epidemics are very sensitive to changes in parameter values where they pass through the threshold for disease persistence and small changes can have dramatic influences.20 Such models are perhaps more appropriate for generating qualitative insights despite the impression that they provide quantitative outputs.


For all the epidemic modelling tools described above, HIV sero-prevalence data are a key building block. HIV sero-prevalence data are generated by surveillance systems that vary in completeness of implementation across countries.21 A new analysis of the quality of HIV sero-surveillance in 141 low and middle income countries between 1991 and 2002 allows us to assess trend in the quality of sero-surveillance and to identify countries with poorly functioning systems that require urgent strengthening.22 This analysis of the quality of sero-surveillance is also an important tool to help evaluate the quality of HIV/AIDS estimates. The quality of sero-surveillance in each country has been used to help construct plausibility bounds around the UNAIDS/WHO end-2003 HIV/AIDS estimates generated by the above tools.23 As explained in the first three papers in this issue, different information sources and different assumptions are used to create national estimates. The accuracy of these estimates depends critically on the quantity and quality of HIV prevalence data, as well as the assumptions used to translate these data into national estimates of the number of adults living with HIV/AIDS, new infections and deaths among adults, and the number of children newly infected with HIV, living with HIV/AIDS, and child deaths. The different sources of error in each of the steps used to develop estimates are combined to generate plausibility bounds around the end-2003 estimates. Although some of these bounds are wider than previously published ranges around HIV/AIDS estimates,2 they are based on a more thorough consideration of the quantity and quality of information that informs each step of the estimation process. As a result, these plausibility bounds reflect better than before the validity of the estimates.


The majority of modelling of the spread of HIV has focused on understanding the influence of patterns of contact and the biology of infection on the spread of the virus. Such research has greatly advanced our understanding of the importance of heterogeneity in risk behaviours,24 patterns of sexual partner choice,25,26 and the overlap in time between sexual partnerships.27 It has shown us the difficulty in accurately measuring the key variables controlling the spread of the virus28 and how effective preventive interventions can be.29,30 To date, limited progress has been made in using such transmission dynamic models in analysing local epidemiological data and formulating local policy. For this to happen, modelling needs to be confronted with the available data, reflect its quality, and play a role in improving this quality and to educate local epidemiologists in both the strengths and weaknesses of the models. The models presented in this series represent an earnest step in a process that aims to engage theory with practise. Newly developed methods for plausibility bounds around HIV/AIDS estimates add a new dimension to the information contained in these estimates. This added information should make for a better understanding of this type of information by different users.