Article Text

## Abstract

**Objectives** To examine the distribution of chlamydia, gonorrhoea and syphilis in the USA through the use of Lorenz curves and Gini coefficients.

**Methods** The distribution of three sexually transmitted diseases (STD; chlamydia, gonorrhoea and primary and secondary syphilis) was examined across states and counties in the USA in 2007, based on reported case numbers. Gini coefficients, which can range from 0 (equality in STD rates across geographical units) to 1 (complete inequality such that all STD occur in one geographical unit) were calculated.

**Results** Overall, chlamydia was the most evenly distributed and syphilis was the most concentrated of the three STD examined. The Gini coefficients for chlamydia, gonorrhoea and syphilis were 0.121, 0.255 and 0.334, respectively, when examined across states, and 0.319, 0.494 and 0.630, respectively, when examined across counties. Differences in Gini coefficients were observed when the STD distributions were examined by sex, race/ethnicity and age group.

**Conclusions** The use of Lorenz curves and Gini coefficients can help to assess inequalities in the distribution of STD, to gauge the suitability of geographically targeted interventions, and to help in determining the epidemic phase of STD. Having a better understanding of the disparities in the distribution of STD across states and counties by sex, race/ethnicity and age group might help in understanding why disparities in STD rates exist across different groups and in developing interventions to address these disparities.

- Chlamydia
- gonorrhoea
- STD
- syphilis

## Statistics from Altmetric.com

Lorenz curves and Gini coefficients are primarily used to illustrate and quantify inequalities in income distribution across a population.1 These two measures, however, have been adapted for use in the field of sexually transmitted disease (STD) prevention, such as in measuring the concentration of STD2–5 and sex workers6 across geographical units, the distribution of clients across sex workers7 and the distribution of sexual behaviours across individuals.8

The use of Lorenz curves and Gini coefficients can help to assess inequalities in the distribution of STD across states and counties and to gauge the suitability of geographically targeted interventions. In general, as the Gini coefficient increases (indicating greater inequality in the distribution of STD cases), the appropriateness of geographically targeted interventions probably also increases.2

Lorenz curves and Gini coefficients can also be useful in assessing the epidemic phase of an STD.2 4 5 The impact and cost-effectiveness of STD prevention efforts probably depend partly on the epidemic phase of the STD, and phase-specific prevention interventions are expected to be more efficient than interventions that do not take into account the epidemic phase (eg, growth, hyperendemic, decline, endemic).5 9–14

Kerani and colleagues2 compared the concentration of four STD (syphilis, gonorrhoea, chlamydia and genital herpes) across census tracts in King County, Washington, in 2000 and 2001, using Lorenz curves and Gini coefficients, and proposed methods for estimating the CI of these measures. In this study, we used Lorenz curves and Gini coefficients to examine the distribution of reported chlamydia, gonorrhoea and primary and secondary syphilis cases across states and counties in the USA in 2007, by sex, race/ethnicity and age.

## Methods

We constructed Lorenz curves and calculated Gini coefficients (described below) to examine the distribution of reported cases of chlamydia, gonorrhoea and primary and secondary syphilis across states and counties in the USA in 2007. STD case numbers and rates were obtained from surveillance records maintained by the Centers for Disease Control and Prevention (CDC). The CDC's Division of STD Prevention collects STD surveillance data reported by STD control programmes and health departments at the state and local levels. State and county population estimates were also obtained from CDC's STD surveillance system, which incorporates population estimates from Bureau of the Census data. The STD data and population estimates are described in more detail in the annual STD surveillance report, which presents statistics and trends for STD in the USA.15 Although the surveillance report includes information on STD in outlying areas (Guam, Puerto Rico and the Virgin Islands), these outlying areas were not included in our analyses. The case counts and rates by sex and age group and race/ethnicity we applied may differ from those reported in the 2007 STD surveillance report because we excluded cases in which sex or age or race/ethnicity was not specified (when applicable), whereas in the surveillance report cases with missing information were prorated according to the distribution of cases for which this information was not missing.15

More complete descriptions of Lorenz curves and Gini coefficients are available elsewhere.1–4 Our application of Lorenz curves and Gini coefficients is summarised in figure 1 and described briefly here. To plot the Lorenz curve for the distribution of chlamydia across states, we first ranked the states by their rate of chlamydia in ascending order (ie, the state with the lowest chlamydia rate was ranked first). We then plotted the cumulative proportion of cases of chlamydia accounted for by a cumulative proportion of the population. The resulting Lorenz curve was compared against a diagonal line of equality, which represents the distribution of chlamydia that would be observed if the chlamydia rate did not vary by state. The Lorenz curves for the distribution of other STD across states and for the distribution of STD across counties were developed in an analogous manner.

The Gini coefficient quantifies the divergence of the Lorenz curve from the diagonal line of equality, and can range from 0 (indicating complete equality in STD rates across states or counties) to 1 (indicating complete inequality, as if all STD cases occurred in only one state or county). The Gini coefficient is equal to twice the area between the Lorenz curve and the diagonal line of equality (see figure 1).

Lorenz curves were developed and Gini coefficients were calculated for each STD overall, as well as for each STD stratified by sex, race/ethnicity (Hispanic, non-Hispanic white and non-Hispanic black), and age group (ages 15–24 years, 25–34 years, 35–44 years and 45 years and older). For each STD, we also calculated the percentage of STD cases occurring in the top 5%, 20% and 50% of the population, when grouped by states (or counties) and ranked by state (or county) STD rate.

When using state-level STD data, the Gini coefficient (G) for a given STD can be calculated as:

where *i* denotes states, ranked by STD rate from lowest to highest (eg, for chlamydia, *i*=1 denotes the state with the lowest chlamydia rate and *i*=51 denotes the state with the highest chlamydia rate), *Y _{i}* is the cumulative proportion of national cases of the given STD occurring in State 1 to State

*i*,

*X*is the cumulative percentage of the national population living in State 1 to State

_{i}*i*,

*X*

_{0}

*Y*

_{0}are 0, and

*k*= 51 (for the 50 states plus Washington, DC).16 17 Gini coefficients for the county-level distribution of STD can be calculated in an analogous manner. For our county-level analyses, we treated Washington DC as a county. When focusing on overall STD rates, our county-level analyses included 3141 counties for syphilis and 3127 counties for gonorrhoea and chlamydia, as a result of missing data. State population size ranged from 0.5 million to 36.6 million, and county population size ranged from 55 to 9.9 million.

Similar to Kerani and colleagues,2 we used bootstrapping methods to calculate the CI for the Gini coefficients. We took 500 random samples of our original data and, for each random sample, we calculated the Gini coefficient. The 95% CI for the Gini coefficient were calculated from the 2.5 and 97.5 percentiles of these 500 ‘bootstrapped’ Gini coefficients.

## Results

In 2007, there were 1 108 374 chlamydia cases, 355 991 gonorrhoea cases and 11 466 primary and secondary syphilis cases. Overall, chlamydia was the most evenly distributed and syphilis was the most concentrated of the three STD we examined, with gonorrhoea in the middle of these two extremes. These three distributions are illustrated by the Lorenz curves in which the curve for chlamydia is closest to the line of equality, followed by gonorrhoea then syphilis (figures 2 and 3). The Lorenz curves show the percentage of the population that accounts for a given percentage of STD cases. For example, 50% of the reported chlamydia cases in 2007 occurred in states that account for 42% of the nation's population. Gonorrhoea and primary and secondary syphilis cases were less evenly distributed than chlamydia cases across states, as 50% of the reported gonorrhoea and primary and secondary syphilis cases occurred in states that account for 33% and 30% of the nation's population, respectively.

Consistent with the Lorenz curves, the Gini coefficients were lowest for chlamydia and highest for syphilis. Overall, the Gini coefficients for chlamydia, gonorrhoea and syphilis were 0.121, 0.255 and 0.334 when examined across states and 0.319, 0.494 and 0.630 when examined across counties (tables 1 and 2). The CI for the Gini coefficients for chlamydia, gonorrhoea and syphilis (overall) did not overlap when examined at the county level. When examined across states, however, only the Gini coefficient for chlamydia (overall) was significantly different than the other STD, as the CI for the Gini coefficients for syphilis and gonorrhoea (overall) did overlap.

Gini coefficients are shown in bold in tables 1 and 2 to indicate significant differences by sex, race/ethnicity or age group. For syphilis, the Gini coefficients were higher for female cases than for male cases at both the state and county level, and these differences were significant (tables 1 and 2). For chlamydia, the Gini coefficients were higher for male cases than for female cases, but this difference was only significant at the county level (tables 1 and 2). For gonorrhoea, the Gini coefficients did not differ significantly by sex (tables 1 and 2).

Across all three STD, Gini coefficients were higher for non-Hispanic white individuals than for non-Hispanic black or Hispanic individuals (tables 1 and 2). However, the only significant differences across race and ethnicity were found at the county level (table 2).

With the exception of gonorrhoea and syphilis at the state level (table 1), the Gini coefficients were higher across all STD for the 45 year and older age group than for the younger age groups we examined (tables 1 and 2). However, differences in Gini coefficients across age groups were significant only for chlamydia at both the state and county level and for gonorrhoea at the county level (tables 1 and 2).

## Discussion

For 2007, Gini coefficients for three STD in the USA followed two well-established patterns. First, more common STD (such as chlamydia) tend to have lower Gini coefficients than less common STD (such as syphilis).2–4 Second, as the unit of analysis decreased (eg, county rather than state), the Gini coefficients tended to increase.18 19 Such changes in the unit of analysis probably explain most of the differences between the Gini coefficients we calculated in our county-level analyses (0.319, 0.494, 0.630 for chlamydia, gonorrhoea and syphilis, respectively) and the Gini coefficients presented by Kerani and colleagues2 at the census-tract level for King County, Washington (crude Gini coefficients of 0.443, 0.633 and 0.915, respectively, and estimated Gini coefficients of 0.411, 0.570 and 0.682, respectively). For comparison, the Gini coefficients for chlamydia and gonorrhoea calculated at the level of the 51 ‘postal forward sortation areas’ in Manitoba in 1998 were 0.45 and 0.66, respectively,4 which were slightly higher than the Gini coefficients for chlamydia and gonorrhoea calculated at the census-ward level in an urban area (Leeds, UK) in 1994–5 (0.26 and 0.49, respectively).3

We found higher Gini coefficients for non-Hispanic white individuals compared with Hispanic and non-Hispanic black individuals, indicating that STD rates are more evenly distributed geographically for the racial/ethnic minority groups than for non-Hispanic white individuals. In the USA, race and ethnicity are risk markers that are associated with social determinants of STD rates, such as poverty, lack of access to quality health care, illegal drug use and living in areas in which STD prevalence rates are high.15 20–22 These and other factors that contribute to the disproportionate burden of STD among minorities might also contribute to the geographical diffusion of STD among minority populations.

A similar explanation might account for disparities in the Gini coefficient by age group. Gini coefficients for ages 45 years and older were higher than those for younger age groups in the county-level analyses. STD rates among teenagers and young adults are typically higher than among older adults, and the same factors that contribute to the increased rate of STD among teenagers and young adults might also contribute to the geographical diffusion of STD in these age groups. Examples of such risk factors include behavioural dynamics (eg, teenagers and young adults are more likely to have multiple sex partners in a given year than are older adults), social determinants such as barriers to accessing STD treatment and prevention services (eg, inability to pay for services, lack of transportation), and biological factors that can impact the risk of STD acquisition.15 23 24

The lower Gini coefficient for male syphilis than for female syphilis might also be attributable partly to the influence of syphilis cases among men who have sex with men.25 The majority of syphilis cases in 2007 occurred in men who have sex with men,15 often in high-population urban areas across the country.26 27 In contrast, most syphilis cases in women in 2007 occurred among African-American women, and were concentrated in southern states.15

Our study is subject to the limitations associated with surveillance data, most notably the potential underreporting of cases.15 Although our results would not be influenced by underreporting of cases if the degree of underreporting was constant across all states and counties, such a scenario of equal underreporting is unlikely. Similarly, differences in reporting of STD across race/ethnicity might account for the differences in distributions of STD that we observed for non-Hispanic white individuals compared with non-Hispanic black and Hispanic individuals.

In order to calculate the 95% CI we assumed that the state and county STD rates represent a simple random sample. In future analyses, the bootstrapping methods should consider accounting for the spatial contiguity and non-constant variance of STD rates. In calculating the Gini coefficients, we made no adjustments to account for the discreteness of STD cases. For example, a county cannot have 0.1 cases of syphilis, and inequality in the county-level distribution of syphilis is inevitable if the number of counties exceeds the number of syphilis cases (such as was the case for syphilis in women). As Kerani and colleagues2 show, making such adjustments can reduce the Gini coefficient notably, particularly for STD that are the most geographically concentrated.

In this study, we applied Lorenz curves and Gini coefficients to give a snapshot of the distribution of three STD across states and counties in the USA. In doing so, we examined differences in the distribution of STD by sex, race/ethnicity and age group. Having a better understanding of disparities in the distribution of STD across states and counties by sex, race/ethnicity and age group might help in understanding why disparities in STD rates exist across different groups and in developing interventions to address these disparities.

Several other practical uses of Lorenz curves and Gini coefficients have been suggested for the purposes of STD prevention. For example, using these tools to measure the concentration of STD can help in determining the suitability of geographically targeted interventions and can help to assess the epidemic phase of an STD.2 4 5 However, the use of Gini coefficients in this manner, if indeed possible, would probably require a series of Gini coefficients over time, rather than Gini coefficients for a single point in time (such as 2007 as presented here). Future research is needed to examine the practicality of using Gini coefficients to inform STD prevention activities. For example, an analysis of historical Gini coefficients by sex, race/ethnicity and age, along with historical STD data, could help determine whether Gini coefficients can be used to detect and predict trends in STD rates.

### Key messages

Lorenz curves and Gini coefficients can be used to examine the distribution of STD across geographic units such as states and counties.

Overall, chlamydia was the most evenly distributed and syphilis was the most concentrated of the three STD we examined, with gonorrhoea in the middle of these two extremes.

Assessments of disparities in the distribution of STD across states and counties by sex, race/ethnicity and age group might help in understanding why disparities in STD rates exist across different groups and in developing strategies to address these disparities.

## References

## Footnotes

Linked articles 040865.

The findings and conclusions in this report are those of the authors and do not necessarily represent the official position of the US Centers for Disease Control and Prevention.

Competing interests None declared.

Provenance and peer review Not commissioned; externally peer reviewed.