Article Text

Original article
Per capita incidence of sexually transmitted infections increases systematically with urban population size: a cross-sectional study
  1. Oscar Patterson-Lomba1,
  2. Edward Goldstein2,
  3. Andrés Gómez-Liévano3,
  4. Carlos Castillo-Chavez4,
  5. Sherry Towers4
  1. 1Department of Biostatistics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  2. 2Department of Epidemiology, Centre for Communicable Disease Dynamics, Harvard T.H. Chan School of Public Health, Boston, Massachusetts, USA
  3. 3Center for International Development, Harvard Kennedy School, Harvard University, Cambridge, Massachusetts, USA
  4. 4Simon A. Levin Mathematical, Computational and Modeling Sciences Center, School of Human Evolution and Social Change, Arizona State University, Tempe, Arizona, USA
  1. Correspondence to Dr Oscar Patterson-Lomba, Department of Biostatistics, Harvard T.H. Chan School of Public Health, 677 Huntington Ave, Boston, MA 02115, USA; opatters{at}


Objectives Rampant urbanisation rates across the globe demand that we improve our understanding of how infectious diseases spread in modern urban landscapes, where larger and more connected host populations enhance the thriving capacity of certain pathogens.

Methods A data-driven approach is employed to study the ability of sexually transmitted diseases (STDs) to thrive in urban areas. The conduciveness of population size of urban areas and their socioeconomic characteristics are used as predictors of disease incidence, using confirmed-case data on STDs in the USA as a case study.

Results A superlinear relation between STD incidence and urban population size is found, even after controlling for various socioeconomic aspects, suggesting that doubling the population size of a city results in an expected increase in STD incidence larger than twofold, provided that all other socioeconomic aspects remain fixed. Additionally, the percentage of African–Americans, income inequalities, education and per capita income are found to have a significant impact on the incidence of each of the three STDs studied.

Conclusions STDs disproportionately concentrate in larger cities. Hence, larger urban areas merit extra prevention and treatment efforts, especially in low-income and middle-income countries where urbanisation rates are higher.


Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


Rapid urbanisation is taking place across the world, especially in low-income and middle-income countries. By 2050, over 60% of the world's population is expected to live in urban areas.1 Rampant urban population growth and densification, in combination with socioeconomic disparities and precarious public health systems, may create favourable conditions for the wide spread of certain infectious diseases (IDs).2–7

Population growth, international travel and migration are common features of urbanisation that can turn cities into gateways and enhancers of IDs spread.3 ,8 Urbanisation may have played a critical role in the worldwide dissemination of HIV,5 ,9 the worsening of epidemics of major respiratory viruses (eg, influenza, Respiratory Syncytial Virus)10 and tuberculosis.2 ,3 ,11 Also critical is that larger populations and densities may be associated with environments that promote increased human interactions, and changes in social norms and mixing patterns,12 possibly affecting ID transmission.

Even though urban dwellers on average enjoy better health, education and income, socioeconomic disparities are often exacerbated in urban environments.8 ,11 ,13 These disparities can be determinants for an individual's overall health,3 ,4 ,6 and could also play a key role in infection patterns.

Some aspects limiting further understanding the dynamics of IDs in urban areas are (a) lack of data within and across cities; (b) when these data are available, most studies have focused on country-level, state-level or county-level associations; and (c) mechanistic models are often context (city/region) specific and complex, reducing our ability to factor out generalisable insights.

We use data on confirmed cases of sexually transmitted diseases (STDs) as a case study to begin addressing some of these deficiencies. The data include annual cases of disease incidence across US cities from 2007 to 2011 for syphilis, gonorrhoea and chlamydia, three IDs of public health interest given their long-term complications, particularly if untreated. Using metropolitan statistical areas (MSAs) as unit of analysis, we go beyond the rural-versus-urban comparisons or the study of a particular disease in one city at a time.3 ,14 ,15 Instead, we take a cross-sectional look at all MSAs for three STDs, an approach that allows the identification of systemic patterns that can generate insights on the interplay between urban population size and the spread of STDs.

On average, the rates of chlamydia, gonorrhoea and syphilis are higher in urban areas, as compared with the national rates in the USA (see figure 1). Previous studies have found that the percentage of African–Americans in a population is an important factor to consider,17–19 with STD rates in African–Americans largely exceeding those in Caucasians.14 These STD-burden disparities can be traced back to a greater proportion of African–Americans living in geographical clusters characterised by low educational attainment, high unemployment rates and lower socioeconomic status.18 ,20 Hence, accessibility to healthcare services, education, as well as income and social inequalities are key factors in the spread of STDs.

Figure 1

Logarithm of infection rates (cases per 105) for chlamydia, gonorrhoea and syphilis from 2007 to 2011. Comparison between the 50 most populous metropolitan statistical areas (MSAs, solid lines) versus all US population (dashed lines). Rates are consistently higher in MSAs, particularly for syphilis, the least common of the three sexually transmitted diseases under study.16

Noteworthy, the role of population size—and more specifically, the population size of MSAs—where individuals reside has received much less attention. In this work, we study the associations between STD incidence and population size of MSAs, while also controlling for the confounders mentioned above.

So far, most of the studies that have analysed these issues often considered states, counties or single cities/regions as the unit of analysis.17 ,19 Naturally, our approach will build upon the existing knowledge by focusing instead on MSAs; to our knowledge, the first study to do so.

Why MSAs?

MSAs’ definitions are not political or administrative, but based on levels of economic integration and commuting.21 MSAs are commonly used in urban economics, regional science and economic geography, and so employing MSAs as the unit of analysis may be more appropriate to study human infection processes. An MSA is defined as a core area (ie, a city) containing a large population nucleus (>50 000 people), along with adjacent suburban counties that share a high degree of socioeconomic integration.22 By definition, a significant fraction of people (>25%) within the boundaries of an MSA commutes on a regular basis to and from the core population, potentially having relevant implications for disease dynamics. Thus, MSAs can be seen as highly interconnected metapopulation containing an urban hub where IDs can spread relatively fast. For these reasons, MSAs are a more suitable unit of analysis for studying disease infection patterns in urban landscapes than states or counties.

Materials and methods

Data sources

Our response variables are the annual counts of chlamydia, gonorrhoea and syphilis (primary, secondary and congenital) from 2007 to 2011, in the counties of the 48 contiguous states, as reported by the Centers for Disease Control and Prevention (CDC).16 This dataset does not include any individual-level data.

The explanatory covariates are:

  • Population of counties (2010 estimates from Census Bureau).

  • Percentage of African–American residents (Census Bureau).

  • Income inequality (Gini Index), describing the income distribution in a population (Measure of America23).

  • Education Index based on two subindices: Educational Attainment Index and Enrolment Index (details in ref. 23).

  • Per cent insured under 65 years of age (Census Bureau).

  • Income per capita in units of US$1000 (Census Bureau).

Since the STD and the covariate data were at the county level, we constructed MSA-level metrics using county-level data (see online supplementary information for details and for summary statistics of these variables). Of the 375 MSAs within the 48 contiguous states, our dataset had information on 364. Statistical analyses were performed in R.

The regression model

We assess how well the variation of urban STD incidence can be explained by the population size of urban areas and their socioeconomic characteristics. Ignoring the impact of the socioeconomic covariates for the moment, we approach this issue using a framework from urban scaling theory,24 which supposes that Y, an urban metric, and N, the population size of an urban area, satisfy the scaling relationshipEmbedded Image 1 Here, Y is considered a random variable with Y0 as the baseline value. The scaling exponent, αN, measures the average relative change in Y with respect to N. Assuming that for each city indexed Embedded Image,Embedded Image 2where Embedded Image is a random noise, we linearise equation (2) by taking base 10 logarithms, yieldingEmbedded Image 3Empirical regularities can then be investigated using regression techniques, allowing the assessment of population size as a predictor of Y, and also the estimation of the scaling exponent αN. It has been found that population size is a strong predictor of several urban metrics. Quantities reflecting interaction processes such as new HIV cases, innovation and violent crime have a scaling exponent Embedded Image (superlinear scaling), whereas those accounting for a city's physical infrastructure display Embedded Image.24–26

In the context of IDs, an informative interpretation of αN is: let Yi and Ni be, respectively, the number of infected cases and the population of MSA i. Dividing both sides of equation (1) by Ni yields Embedded Image, the fraction of the population that is infected or the per capita incidence. When Embedded Image, Embedded Image is an increasing function of N. Therefore, when the scaling relation is superlinear, the expected per capita incidence will increase with population size. In fact, incidence increases systematically on a per capita basis by Embedded Image with every doubling of the population.

We are also interested in discerning which socioeconomic factors are associated with the patterns of STD incidence in cities. To that end, we extend model (3) by adding the covariates of interest, yieldingEmbedded Image 4where A, G, E, S and I represent, respectively, percentage of African–American, Gini, Education Index, percentage of insured and per capita income.

We define ‘incidence’ as the sum of reported cases between 2007 and 2011, a 5-year cumulative incidence. A parsimonious way to model incidence data is to assume that the counts are Poisson distributed. However, since these data are overdispersed compared with the Poisson model, we employ negative binomial regression (see online supplementary information for details).

To help account for the issue of multicollinearity among covariates, we use a variable-selection algorithm (stepAIC) to determine the most parsimonious model that better explains the incidence variability, excluding the less significant covariates via the Akaike Information Criterion. Employing Least Absolute Shrinkage and Selection Operator (LASSO) regression yielded similar results (see online supplementary information).

Using this scaling framework, we address a series of questions. First, is the relationship between city size and incidence superlinear? Second, are these scaling relationships different for each disease? Third, does this relationship persist after controlling for important socioeconomic confounders? And fourth, which socioeconomic covariates are significantly associated with STD incidence?


We first estimate the scaling exponents αN using the simpler model in (3). The estimates are αN=1.04 (0.02), 1.10 (0.04) and 1.29 (0.05), respectively for chlamydia, gonorrhoea and syphilis (standard errors in parenthesis). To assess whether the scaling parameter estimates αN are significantly larger than one for each disease, we test the null H0:Embedded Image. We can reject the null with a significance level of 0.01 to guard against type 1 error (except for the case of chlamydia with p=0.02), evidencing that all three diseases feature superlinear scaling and that larger urban areas seem to be conducive of increased per capita STD incidence.

We can also conclude that the scaling patterns for these three STDs in MSAs are different from each other (p≪0.01) (see online supplementary information for details).

To evaluate whether these superscaling patterns persist even after controlling with important socioeconomic factors, we employ the full model (4). The results are given in table 1. Figure 2 depicts eloquently the superlinear pattern between STD incidence and population size when using model (4). It can be shown that, indeed, after accounting for important confounders, the scaling exponents are still larger than one (with significance level of 0.02).

Table 1

Regression analysis results using model in (4)

Figure 2

Scaling of STD incidence with MSA population with Negative Binomial regression for chlamydia (left), gonorrhoea (centre) and syphilis (right) using model (4), as reported in Table 1. Comparing the blue lines (with slopes ) with the dotted lines (with slope 1) shows the departures from the linear pattern in each case.

Inferences from socioeconomic covariates

In all three cases, estimates of αA are significantly larger than zero, indicating that, everything else equal, an increase in the percentage of African–Americans in cities is associated with an increase in the incidence of STDs. In fact, the largest increases are for gonorrhoea and syphilis.

Estimates of αG remain significant and positive in all three STDs, suggesting that larger income inequality is associated with higher incidence. Education, not surprisingly, is consistently associated with a reduction of the expected incidence. Per cent insured is only selected for the case of gonorrhoea, with a positive association with incidence. Lastly, per capita income is also significant for all three STDs, with higher income being associated with a reduction in expected incidence, except for the case of syphilis (see online supplementary information for a more detailed interpretation of these associations).


Characterising the relationship between the spread of STDs and urban socioeconomic aspects may offer suitable venues to determine effective ways of deploying control efforts in urban populations. This issue gathers particular relevance given the upward trends in urbanisation worldwide, especially in low-income and middle-income countries where large socioeconomic disparities can create favourable conditions for the large-scale spread of certain IDs.

Here, we show that, even after controlling for important socioeconomic confounders, annual incidence of chlamydia, gonorrhoea and syphilis across USA metropolitan areas follow a superlinear scaling pattern, suggesting that the per capita incidence of STDs increases, on average, with the population size of MSAs. In other words, when living in urban areas with similar socioeconomic indicators, a person is at higher risk of contracting an STD if he/she resides in a larger area. This empirical regularity strongly suggests that larger MSAs merit extra prevention and treatment efforts.

In addition, the scaling patterns for each STD are significantly different. To begin to understand why, note that chlamydia is the least symptomatic of the three,16 making it the most difficult to detect and prevent, and is therefore the most prevalent of the three. Gonorrhoea is more symptomatic and less prevalent, and syphilis is the most symptomatic and the least prevalent. Moreover, syphilis has the lowest infection risk per sexual act, with chlamydia and gonorrhoea having comparable infectiousness (ref. Australasian contact tracing manual). Certainly, the symptomatology and inherent infectivity of a disease affect its overall transmissibility, with more asymptomatic and infectious STDs (eg, chlamydia), affecting a larger proportion of the population, compared with less transmissible ones, such as syphilis, which presumably propagate largely through high-risk (core) sexual networks27 where individuals engage in riskier sexual practices.

At the same time, note that the associations of incidence with population size (quantified by αN) decrease as STDs increase their overall transmissibility and prevalence. Thus, this scaling analysis suggests that the easier an STD can spread, the lower the association of population size with its transmission potential. According to this view, population size has an enabling effect on STD transmission, but more so for STDs that transmit with less ease. Hence, the spreading capacity of syphilis is considerably tied in to the population size, which could be, in turn, due to the disproportionate presence of larger high-risk sexual networks in larger cities. Arguably, the structure and size of high-risk sexual networks are affected by the size of the cities in which they exist.

In connection with this, chlamydia is the most evenly distributed geographically in the USA, with gonorrhoea and syphilis being more geographically clustered (in that order).28 Our estimates of the scaling exponent for each STD also increase in that order, indicating that spatial clustering and superlinearity are positively correlated.26 Moreover, the level of clustering increases as the level of aggregation decreases (from states, to MSAs to counties; see ref. 28 and online supplementary information). Extrapolating this trend, we expect that at the community level, spatial heterogeneities are even higher, implying that few communities feature very large rates of STDs compared with the rest of the communities. These communities with extreme levels of infection are likely to harbour high-risk sexual networks.27

Our study, as others for example,18 ,19 also suggests that the percentage of African–Americans in the population is one of the most significant covariates explaining the variability of STD incidence in cities. Moreover, given the risk-prone socioeconomic environments in which African–Americans live and develop, they likely are important contributors to the maintenance and growth of the high-risk sexual networks in which the less-prevalent STDs persist.

The indication that larger levels of income inequality—which are often mirrored in education and healthcare access disparities—are associated with an increase in the expected number of STD cases is in line with previous findings17 and supports the view that larger income inequality correlates with worse health outcomes.29

Paradoxically, our analysis indicates that more medical coverage is associated with larger incidence of gonorrhoea. Univariate regression of incidence with percentage of insured yielded a negative association. However, after controlling for population size and percentage of African–American, the association reverses its direction. Thus, this finding should be interpreted cautiously (see online supplementary information for details).

The association of income and incidence differs between STDs, with incidence increasing with income only in the case of syphilis. This association may be confounded by the contribution of men who have sex with men (MSM) in the transmission of syphilis (CDC recently reported that 75% cases of syphilis are currently in MSM), particularly given that larger and more affluent cities typically harbour larger populations of MSM.30 Hence, a key covariate for future analyses is the per cent of residents who are MSM (see online supplementary information for further discussion).

A limitation in further improving the interpretation of these associations is that the data did not contain any information regarding the gender, age, race or sexual orientation of those infected. As a result, we only infer associations at the MSA level. To avoid committing ‘ecological fallacies’, associations inferred at a certain level of aggregation (eg, MSA level) cannot be extrapolated to other levels of aggregation (eg, individual level) without additional evidence. Moreover, causality will be hard to infer without additional information on those infected. Noteworthy, our study does not aim to infer a direct causal link between population size and the chances of contracting an STD. Our claim, instead, is that population size alone can be a good aggregate proxy for other myriad factors that directly affect the spreading potential of an STD.

The systemic associations found herein might not apply to a particular city. There may be large cities with relatively low levels of STD incidence. What we find is that these types of cities are not the norm. In fact, it is often cities that deviate from their expected behaviour that are the most interesting for both policy and scientific analyses.25 Another limiting aspect is that the number of reported STD cases could have been subject to biases or inaccurate reporting. Any systematic under-reporting or over-reporting may have influenced the inferred patterns. For example, the lack of information on healthcare services, such as diagnostic/testing rates, precludes us from ascertaining how much of the superscaling effect is due to a better testing/diagnostic infrastructure in larger and more developed cities.

Furthermore, although our regression model choice is reasonable and interpretable, we recognise that other regression models could have been used. As in most linear regression models, ours makes no assumption about the distribution of the predictors. Hence, the large variation in the distribution of MSA populations21 does not affect the estimations.

In summary, this study quantifies the associations between STD incidence and urban population size, while controlling for important socioeconomic confounders. To our knowledge, it is the first to take this approach. The superlinear scaling found, in essence, denotes a continuous and systematic enabling of STD transmission with increasing population size so that larger cities suffer from disproportionately larger burdens of STD incidence. Moreover, population size appears to have a greater enabling effect for STDs that transmit with less ease. These insights indicate that incidence hotspots are likely located in larger cities, and thus provide a criterion for effectively allocating interventions.

Key messages

  • The focus of this study is on the incidence of sexually transmitted diseases (STDs) in metropolitan statistical areas (MSAs).

  • The analyses suggest that the per capita incidence of STDs increases systematically with urban population size, more so for less prevalent STDs.

  • Larger urban areas merit extra prevention efforts, a particularly important recommendation given the rise in urban populations worldwide, especially in low-income and middle-income countries.


The authors would like to thank the reviewers for insightful comments that helped improve the manuscript. We also thank Sarah Kidd and Lyn Finelli at the Centers for Disease Control and Prevention for facilitating access to the STD data used herein. Finally, we thank Jay Taylor and Julia Wu for useful discussions.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.

    Files in this Data Supplement:


  • Handling editor Jackie A Cassell

  • Contributors OP-L, EG, AG-L and ST conceived and designed the study. OP-L carried out the statistical analyses and wrote a first draft. All authors contributed to the writing and revision of the final manuscript.

  • Funding This work was supported by National Institutes of Health (NIH) training grant number T32AI007358-26 (OP-L); by the National Institute of General Medical Sciences (NIGMS) award number U54GM088558 and by the NIH K01 award 1K01AI101010-01 (EG); and by NIGMS grant number 1R01GM100471-01 (CC-C and ST).

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.