Article Text

Download PDFPDF

Original article
Genetic transmission networks reveal the transmission patterns of HIV-1 CRF01_AE in China
  1. Xiaoshan Li1,2,
  2. Rong Gao3,
  3. Kexin Zhu4,
  4. Feiran Wei5,
  5. Kun Fang1,
  6. Wei Li6,
  7. Yue Song1,
  8. You Ge1,
  9. Yu Ji1,
  10. Ping Zhong7,
  11. Pingmin Wei1
  1. 1 Teaching and Research Office of Epidemiology and Health Statistics, School of Public Health, Southeast University, Nanjing, China
  2. 2 Key Laboratory of Environmental Medicine Engineering, Ministry of Education, School of Public Health, Southeast University, Nanjing, China
  3. 3 Department of Microbiology and Immunology, Medical School of Southeast University, Nanjing, China
  4. 4 School of Public Health, Nantong University, Nantong, China
  5. 5 Department of Oncology, Medical School of Southeast University, Nanjing, China
  6. 6 Department of Infectious Disease Prevention and School Health, Nanjing Municipal Center for Disease Control and Prevention, Nanjing, China
  7. 7 Department of AIDS and STD, Shanghai Municipal Center for Disease Control and Prevention, Shanghai Municipal Institutes for Preventive Medicine, Shanghai, China
  1. Correspondence to Professor Pingmin Wei, Teaching and Research Office of Epidemiology and Health Statistics, School of Public Health, Southeast University, 87 Dingjiaqiao Road (W), Nanjing 210009, China; mpw1963{at}126.com and Professor Ping Zhong, Department of AIDS and STD, Shanghai Municipal Center for Disease Control and Prevention, Shanghai Municipal Institutes for Preventive Medicine, 1380 Zhongshan Road (W), Shanghai 200336, China; zhongp56{at}hotmail.com

Abstract

Objectives The epidemic of HIV-1 CRF01_AE has become a major public health issue in China. This study aimed to characterise the transmission patterns of genetic networks for CRF01_AE nationwide and elucidate possible opportunities for prevention.

Methods We isolated and conducted genetic transmission network analysis of all available CRF01_AE pol sequences (n=4704) from China in the Los Alamos HIV sequence database.

Results A total of 1391 (29.6%) sequences were identified as belonging to 400 separate networks. Of men who have sex with men (MSM) in the networks, 93.8% were linked to other MSM and only 2.4% were linked to heterosexual women. However, 11.8% heterosexual women in the networks were linked to MSM. Lineages composed mainly of MSM had higher transmission than those that were mostly heterosexuals. Of the 1391 individuals in networks, 513 (36.9%) were linked to cases diagnosed in different provinces. The proportion of individuals involved in inter-province links was interrelated with the number of migrant people (Spearman’s r=0.738, p=0.001).

Conclusions The outcome of this study could help improve our ability to understand HIV transmission among various regions and risk groups in China, and highlighted the importance of targeting MSM and migrants by prevention and intervention efforts.

  • HIV
  • MOLECULAR EPIDEMIOLOGY
  • CHINA
  • AIDS

Statistics from Altmetric.com

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Introduction

The first confirmed circulating recombinant form (CRF) of HIV-1, CRF01_AE, has become the most prevalent strain in China and is predominant among men who have sex with men (MSM).1 2 CRF01_AE has a significant pathogenic impact on disease progression compared with CRF07_BC and subtype B.3 4 Recently, we reported that the high proportion of CXCR4 usage for CRF01_AE strains may result in the loss of susceptibility to maraviroc (CCR5 antagonists).5 Importantly, CRF01_AE has been involved in many newly reported CRFs by a second recombination with other strains.w1 The epidemic of CRF01_AE strains has become a major public health issue in the prevention and control of HIV in China. A better understanding of its transmission patterns could facilitate the design and implementation of interventions to prevent its transmission.

Since HIV is transmitted through networks formed by closely connected people who engage in injecting or sexual practices,6 7 understanding the structure and feature of the networks is important in order to design intervention programmes. The understanding of transmission networks focuses on genetic similarity of sequences and has the capacity to identify potential transmission partners and recognise the links between different populations.8 Recently, we reported genetic transmission networks for CRF01_AE-infected MSM in Shanghai, preliminarily revealing transmission both inside Shanghai and between Shanghai and several neighbouring provinces.9 However, the genetic transmission networks of CRF01_AE at the national level have not been clearly characterised. In this study, we characterised the transmission patterns of genetic networks for HIV-1 CRF01_AE nationwide and elucidate possible opportunities for prevention.

Methods

Study subjects

All available HIV-1 CRF01_AE sequences covering the entire protease (PR) and partial reverse transcriptase (RT) region were downloaded from Los Alamos HIV sequence database (LANL, http://www.hiv.lanl.gov), an open access database which contains comprehensive data on HIV genetic sequences; only the oldest sequences from the same patients were kept. CRF01_AE of each sequence was further determined through phylogenetic tree analysis. All sequences were multiple aligned with the HIV reference sequence HXB2 using Gene Cutter online tool (http://www.hiv.lanl.gov/content/sequence/GENE_CUTTER/cutter.html) and subsequently edited manually. Duplicate sequences were identified and excluded using ElimDupes online tool (http://www.hiv.lanl.gov/content/sequence/ELIMDUPES/elimdupes.html). Eventually, 4704 sequences (HXB2 genome location 2253 to 3252) from various risk groups in 23 provinces of Chinese mainland were included. The distribution of the geographic origins and risk groups for the sequences are summarised in online supplementary table 1. In order to avoid potential biases due to convergent evolution driven by antiretroviral therapy, 45 codons in PR and RT associated with drug resistance mutations were removed according to the last updated (October to November 2015) guideline.w2 The resulting sequence alignment consisted of 865 nt.

Supplementary Table 1

Identification of genetic transmission networks

The flow chart of genetic transmission network analysis included four steps: phylogenetic tree construction, transmission cluster extraction, minimum genetic distance identification and network visualisation.9 Cluster Picker10 was used to extract transmission clusters from the phylogenetic tree, with the intracluster maximum pairwise distance <3.0% nt substitutions per site and bootstrap support value ≥95%. The Tamura-Nei 93 pairwise genetic distances of all sequences within the available clusters were calculated in Mega V.7.0. Among all distances, one that minimises the sum of edge weights (genetic distances) was selected to define the linkages within a cluster.9 Lastly, the network data were visualised and analysed using a custom R script using the network package in the R software.11

Phylogenetic analysis and Bayesian phylodynamic inference

An approximately maximum likelihood phylogenetic tree was built for 4704 sequences using FastTree 2.3 software,12 with GTR+G+I nucleotide substitution model. The most appropriate nucleotide substitution model was selected by using jModelTest.13 The final tree was visualised in Figtree V.1.4.2 (http://beast.bio.ed.ac.uk). Monophyletic groups with bootstrap support ≥0.9 were considered as lineages. To explore the changes in the effective population size of each lineage over time, we undertook a Bayesian skyline plot (BSP) analysis separately by the Bayesian Markov Chain Monte Carlo (MCMC) approach implemented in the BEAST V.1.7.2 for each lineage. In order to reduce excessive computational load, closely related sequences from the same areas were manually removed without compromising the genetic or geographic heterogeneity of each alignment. Ultimately, a downsample of the five separate subdatabases (lineage 1: 264 sequences; lineage 2: 251; lineage 3: 204; lineage 4: 275; lineage 5: 290) were subjected to BSP analysis (figure 1). The model selected was GTR + Relaxed clock (uncorrelated) + Bayesian skyline based on our previous and others’ studies.9 14 The MCMC analyses were run for 200 million generations and sampled every 1000 steps. The output was assessed for convergence by means of effective sampling size (ESS) after a 20% burn-in using Tracer (http://tree.bio.ed.ac.uk/software/tracer/). To minimise the effects of standard errors, only ESS≥200 were accepted.

Figure 1

Phylogenetic analysis of 4704 CRF01_AE pol sequences and inferred transmission networks in each lineage. The phylogenetic tree was constructed using approximately maximum likelihood method based on pol region in FastTree 2.3. The nucleotide substitution mode was GTR+G+I. The bootstrap value ≥0.9 was identified as a lineage. In the transmission networks, various provinces in China are colour coded. Different shapes represent different risk groups: triangle: female heterosexual; square: male heterosexual; circle: men who have sex with men (MSM); pentagon: injecting drug users (IDUs); hexagon: sexual transmission, unspecified type (SU); heptagon: other/unknown groups.

Statistical analysis

Three groups were compared including (1) individuals who did not link to others, (2) individuals who linked to only one and (3) individuals who linked to ≥2 others. χ2 test was used to compare the difference of three groups between risk groups and lineages. In order to explore the correlation between the proportions of individuals involved in interprovince links (a link between two sequences that are from different provinces) by province and the number of migrant people, Spearman’s non-parametric correlation tests were used. The number of migrant people (sum of inflow and outflow population) of each province was collected from the 2010 population census reported by National Bureau of Statistic of China.w3 All analyses were conducted in SPSS V.20.0 software.

Results

Identification of five independent HIV-1 CRF01_AE lineages in China

Phylogenetic analysis clearly identified five distinct major lineages (lineages 1–5) (figure 1), which included 4021 (85.2%) sequences (see online supplementary table 3). In regard to risk groups, lineages 1 (87.2%), 2 (79.7%) and 4 (70.8%) were mainly composed of sequences from MSM, while lineages 3 (44.6%) and 5 (35.1%) were mostly driven by epidemic among heterosexuals. Overall, all lineages were seemingly more epidemic in eastern/southwestern/south-central than northwestern/northern/northeastern (see online supplementary table 3 and figure 1). Lineages 3 and 5, mainly consisted of heterosexuals, reveal a similar regional distribution. The proportion of the two lineages in southwestern and south-central all together was >94% and 85%, respectively. By contrast, the other three lineages, mainly comprising MSM, showed a striking difference in the distribution in each observed region. Lineage 4 was actually distributed evenly in all observed regions except the northwestern. Lineage 2 was mainly composed of CRF01_AE strains from eastern (30.0%), northern (28.9%) and south-central (20.9%) China, while a majority of sequences in lineage 1 were collected from eastern China (69.2%).

Characteristics of genetic transmission networks

A total of 400 transmission networks, involving 1391/4704 (29.6%) database-derived sequences, were identified. The networks ranged in size from 2 to 40, of which most (235, 58.8%) were made up of only two individuals. The number of networks was inversely correlated to size (Spearman’s correlation coefficient = –0.904, p<0.001). Of the 400 networks, 255 (63.8%) were comprised with at least one MSM. Of all networks identified, 47.5% (190/400) contained sequences only isolated from MSM, where 12.5% (50/400) was derived from heterosexuals.

Transmission links between different risk groups

Overall, a majority of MSM (93.8%) shared links with other MSM (table 1) and few were found to link to male (4.4%) and female heterosexuals (2.4%). Of male heterosexuals, 42.5% were linked to female heterosexuals; however, they were evidently linked to other male heterosexuals (30.7%) and MSM (22.8%). Nearly half of female heterosexuals (49.1%) were found to share links with male heterosexuals, while 25.5% and 11.8% were linked to other female and MSM, respectively. We observed that different risk groups were involved in drug use. About 70% of injecting drug users (IDUs) were linked to male and female heterosexuals, and other IDUs with equal proportion (23.5%). Of note, although only 4.4% and 2.4% of MSM were linked to male and female heterosexuals, these links represented 22.8% and 11.8% of links among male and female heterosexuals, respectively. When stratified by lineages (figure 1), even 80.0% male and 42.9% female heterosexuals were linked to MSM in lineage 1, and 66.7% male and 75.0% female heterosexuals were linked to MSM in lineage 2, respectively (see online supplementary table S2).

Table 1

Links between different risk groups in the transmission networks

Transmission links between different provinces

Of the 1391 individuals in networks, 513 (36.9%) were linked to cases diagnosed in different provinces. Beijing, Shanghai, Guangdong, Guangxi, Zhejiang, Sichuan and Shaanxi had transmission linkages with 18, 16, 15, 13, 12, 12 and 10 other provinces, respectively. Besides, the proportions of sequences from Jiangsu, Henan, Guangdong, Shandong, Anhui and Sichuan that formed interprovince links all exceed 60%. Interestingly, we observed a strong positive association between proportions of individuals involved in interprovince links and the number of migrant people (Spearman’s r=0.738, p=0.001) (figure 2).

Figure 2

Correlation between the proportions of individuals involved in interprovince links by province and the number of migrant people. AH, Anhui; BJ, Beijing; FJ, Fujian; GD, Guangdong; GX, Guangxi; HAN, Hainan; HEB, Hebei; HEN, Henan; HUN, Hunan; JS, Jiangsu; LN, Liaoning; SAX, Shaanxi; SC, Sichuan; SD, Shandong; SH, Shanghai; YN, Yunnan; ZJ, Zhejiang.

Individuals with multiple potential transmission links

In total, 1003 links were established by 1391 individuals in this genetic transmission networks (table 2) in 4704 study subjects, among which 21.9% (1029/4704) had 1 link and 7.7% (362/4704) had ≥2 links. These individuals with ≥2 links account for 26.0% of 1391 individuals, but were apparently involved in 629 (62.7%) of 1003 links. Importantly, among individuals with ≥2 links, MSM accounted for 71.8%. A higher proportion related to links was found in MSM than in other risk groups (p<0.001) except male heterosexuals (p=0.060). Besides, 28 individuals with ≥5 links only account for 14.1% (141/1003) of links.

Table 2

Characteristics of individuals by number of potential transmission links in different lineages and risk groups

It is not unexpected that the MSM-related lineages (lineages 1, 2 and 4) had a remarkably higher proportion of ≥1 links than heterosexual-related lineages (lineages 3 and 5), indicating that MSM had a higher transmission than other risk groups (p<0.05, table 2). Moreover, the BSP analysis (see online supplementary figure 2) also coincided with the genetic transmission analysis, implying that these lineages had experienced different epidemic stages before the actual sampling date. Obviously, lineage 1 was still kept in the period of exponential growth. However, both lineages 2 and 4 have undergone the exponential growth period, followed by reaching a steady state after 2005 and 2008, respectively, while both lineages 3 and 5 tended to the decline phase after 2010.

Supplementary Figure 1

Discussion

A previous study with fewer CRF01_AE sequences in LANL database reported multiple lineages of CRF01_AE strains epidemic in China with different characteristics and various epidemic trends.15 In this study, we not only identify five main lineages but also analysed the characteristics of transmission networks for each lineages epidemic in China based on currently available CRF01_AE sequences in LANL database. Our study further confirms what previous studies have reported that the epidemic of HIV-1 CRF01_AE in China was driven by multiple lineages.14 15 Our study indicates for the first time in China that three lineages mainly composed of MSM had higher transmission than the other two lineages related to heterosexuals. This suggests that the impact of risk behaviours on HIV-1 transmission was greater than that of geographical locations. Thus, more benefits may be achieved from taking measures focused on curbing risk behaviours than regionally focused approaches.2 On a separate note, although lineage 1 was only mainly located in eastern China, both genetic transmission network and BSP analysis showed that it was still in a rapidly growing period. As the most developed area and a typical labour force import region, eastern China attracts a large number of migrants to work and live every year, which facilitates high-frequency dissemination of HIV-1 in these regions.9 16 Thus, we speculate that lineage 1 would probably lead to a potential outbreak soon afterwards given the frequent population mobility.

In recent years, the percentage of newly diagnosed cases attributed to homosexual transmission has shown a consistent uptrend, rising from 2.5% in 2006 to 28.3% in 2015.w4 Several studies have confirmed that CRF01_AE was the most prevalent genotype circulating among MSM in China.1 2 The observation that a vast majority of networks were comprised with MSM in our study indicates that frequent unprotected anal intercourse promoted the serious spread of HIV-1 within MSM.17 Importantly, although few MSM were involved in transmission with heterosexual women, these transmissions represented a substantial proportion of HIV acquisitions among heterosexual women. In China, due to traditional values and family expectations, >70% of MSM will be married to women during their lifetimew5 and over a quarter of MSM had sex with women in the past six months.18 Additionally, almost 90% of MSM hid their sexual orientation and only 20% of MSM used condom with their wives.18 These very common unprotected bisexual behaviours led female partners of MSM to high risk of HIV acquisition. As a result, strategies that prioritise MSM are not only effective in preventing spread of HIV in this group, but are also likely to reduce HIV acquisition among female groups.

Hue et al reported that an estimated 1%–11% homosexually acquired infections were misclassified as heterosexuals.19 Hoenigl et al also found that a high number of heterosexual males clustered within MSM networks.20 In this study, nearly half of the male heterosexuals were linked to other male heterosexuals or MSM, which indicated that a substantial proportion of MSM might be self-reported as heterosexual in China. Unexpectedly, 25.5% of female heterosexuals were observed to share links with other women. Although female-to-female transmission has been rarely reported,21 the vast majority of the links between females were probably due to missing sequences data from intermediate infections.8 Therefore, improvements in completeness of molecular surveillance data in combination with contact tracing would enable us to better characterise the transmission network and advance our knowledge of factors contributing to HIV epidemic.8

Previous studies reported that individuals with more links in the network could play the role of ‘core population’, who had a higher probability of transmitting HIV to broader population as a high frequency of partner change and a high viral load existed.22 23 We observed that individuals with multiple links represented only a minority of persons but were involved in most of the persons in network. This result was similar to our previous results in Shanghai and confirmed that a few ‘core population’ indeed drive transmission networks to a significant extent.9 Identifying and controlling these ‘core population’ is therefore crucial for curbing HIV spread in a population.24 25 Further identification of the factors associated with ‘core population’ would help achieve this goal.

In China, numerous migrants constantly flow between different cities to seek better employment opportunities and living conditions. By the end of 2015, the size of migrant population had reached 247 million nationwide.w6 Several studies indicated that migrants accounted for a large proportion of HIV epidemic in China16, w7and posed a persistent increase among newly diagnosed HIV infections every year in some areas.4 16 26 Our previous observation suggested ongoing transmission of CRF01_AE strain between Shanghai city and other provinces, for which migrants played a crucial role.9 In this in-depth observation at national level, we expanded the number of available sequences and revealed that large-scale population migration between China’s provinces undoubtedly facilitated viral transmission. Because male and female migrants are usually far away from their homes, they are more likely to engage in high-risk behaviours,27 making them more vulnerable to HIV than general population.27 28 When annual flow of migrants persists between home regions and different provinces/cities, they may serve as a bridge between at-risk and general population.27 29 However, the high mobility of migrants makes it difficult to monitor HIV infection and manage care.26 Migrants usually have limited access to convenient and long-term health services due to China’s household registration system and urban social security system,30 which also increases the probability of transmitting HIV. Thus, urgent implementation of comprehensive control measures is necessary to monitor and curb the spread of HIV epidemic in migrant population.

Limitations

As all sequences in this study were obtained from public database rather than a random sampling, our analysis was likely subject to selection/sampling bias. Even so, to our knowledge, this study has so far included the highest number of CRF01_AE strain sequences for genetic transmission analysis in China. We anticipate being able to make even stronger inferences as the reach and coverage of molecular HIV surveillance continues to improve. Furthermore, detailed socio-demographic and clinical data were also not available in this study, limiting our ability to explore more factors associated with epidemic of CRF01_AE in China.

Conclusions

The genetic transmission network analysis for HIV-1 CRF01_AE could help improve our ability to understand HIV transmission among various regions and risk groups in China. Both MSM and migrants play an important role in HIV spread, which should be targeted by prevention and intervention efforts. Genetic transmission network analysis with widespread genotypic sampling density, combined with clinical data and behavioural data, may further provide unique insights into trends in CRF01_AE transmission at the population level.

Key messages

  • Although few men who have sex with men (MSM) were involved in transmission with heterosexual women, these transmissions represented a substantial proportion of HIV acquisitions among heterosexual women in China.

  • The lineages mainly composed of MSM had higher transmission than those majorly consisting of heterosexuals.

  • The proportion of individuals involved in interprovince links by province was interrelated with the number of migrant people.

Supplementary material 3

Acknowledgments

The authors very gratefully acknowledge the Centers for Disease Control and Prevention of all provinces and other research teams instrumental in collecting and uploading HIV sequence data. They also thank Dr John Nkengasong, Division of Global HIV and TB, Center for Global Health, US Centers for Diseases Control and Prevention, Atlanta, GA, USA, for his kind assistance.

References

View Abstract

Footnotes

  • XL, RG and KZ contributed equally.

  • Handling editor Jackie A Cassell

  • Contributors XL, PZ and PW conceived and designed the study. XL, RG, KZ, KF, FW, WL, YS, YG and YJ prepared the data. XL, RG and KZ analysed the data. XL, RG, KZ, PZ and PW wrote the paper. All authors read and approved the final manuscript.

  • Funding This work was supported by the Fundamental Research Funds for the Central Universities and Research Innovation Program for College Graduates of Jiangsu Province (KYLX15_0173) and the Humanities and Social Sciences of Ministry of Education Planning Fund of China (no. 16YJA840014).

  • Disclaimer The funders had no role in study design, data collection and analysis, decision to publish or preparation of the manuscript.

  • Competing interests None declared.

  • Ethics approval Review committee number 2017ZDKYSB045.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Correction notice This paper has been amended since it was published Online First. Professor Pingmin Wei is now listed as the first corresponding author and Professor Ping Zhong is the second corresponding author.