Article Text

A multi-centre evaluation of nine rapid, point-of-care syphilis tests using archived sera
  1. A J Herring1,
  2. R C Ballard2,
  3. V Pope2,
  4. R A Adegbola3,
  5. J Changalucha4,
  6. D W Fitzgerald5,
  7. E W Hook III6,
  8. A Kubanova7,
  9. S Mananwatte8,
  10. J W Pape5,
  11. A W Sturm9,
  12. B West3,
  13. Y P Yin10,
  14. R W Peeling11
  1. 1Sexually Transmitted Bacteria Reference Laboratory, Health Protection Agency Laboratory, Bristol, UK
  2. 2Division of AIDS, STD and TB Laboratory Research, Centers for Disease Control and Prevention, Atlanta, Georgia, USA
  3. 3MRC Laboratories, Fajara, The Gambia
  4. 4National Institute for Medical Research, Mwanza, Tanzania
  5. 5Les Centres GHESKIO (Groupe Haitien d’Etude du Sarcome de Kaposi et des Infections Opportunistes), Port au Prince, Haiti
  6. 6University of Alabama, Alabama, USA
  7. 7Central Institute for Skin and Venereal Diseases, Moscow, Russian Federation
  8. 8National STD/AIDS Control Programme, Columbo, Sri Lanka
  9. 9University of Natal, Durban, South Africa
  10. 10National Center for STD, Nanjing, China
  11. 11Sexually Transmitted Diseases Diagnostics Initiative (SDI), WHO/TDR/PRD, Geneva, Switzerland
  1. Correspondence to:
 Dr Rosanna W Peeling
 Sexually Transmitted Diseases Diagnostics Initiative (SDI), WHO/TDR, 20, Avenue Appia, 1211, Geneva 27, Switzerland; peelingr{at}


Objectives: To evaluate nine rapid syphilis tests at eight geographically diverse laboratory sites for their performance and operational characteristics.

Methods: Tests were compared “head to head” using locally assembled panels of 100 archived (50 positive and 50 negative) sera at each site using as reference standards the Treponema pallidum haemagglutination or the T pallidum particle agglutination test. In addition inter-site variation, result stability, test reproducibility and test operational characteristics were assessed.

Results: All nine tests gave good performance relative to the reference standard with sensitivities ranging from 84.5–97.7% and specificities from 84.5–98%. Result stability was variable if result reading was delayed past the recommended period. All the tests were found to be easy to use, especially the lateral flow tests.

Conclusions: All the tests evaluated have acceptable performance characteristics and could make an impact on the control of syphilis. Tests that can use whole blood and do not require refrigeration were selected for further evaluation in field settings.

Statistics from

The World Health Organization (WHO) estimates that approximately 12 million new cases of venereal syphilis occur worldwide each year, most of which are in developing countries where access to sexually transmitted diseases (STD) laboratory services are limited.1 Nonetheless, the disease remains a global health priority. The recent re-emergence of syphilis in the developed world, as seen in Russia and eastern Europe, has been associated with social upheaval and is potentially a contributor to burgeoning HIV epidemics.2 In North America and Western Europe, resurgent syphilis has been associated mainly with men who have sex with men or illicit drug users.3

In most countries where prenatal screening for syphilis is available, the rapid plasma reagin (RPR) test is used. To prevent stillbirth and other adverse outcomes of pregnancy, women who test positive at peripheral clinics are treated without recourse to a confirmatory treponemal test. Given the importance of early treatment, and the efficacy and safety of intramuscular benzathine penicillin, this has proved to be a sound strategy, even though it may lead to unnecessary treatment in some cases. Screening and treatment of pregnant women for syphilis remains cost-effective even when the prevalence is low.4 However, it is estimated that less than 30% of pregnant women are screened for the disease in sub-Saharan Africa,5 while a study in Bolivia showed that, although 76% of the study population received antenatal care, only 17% were screened for syphilis during pregnancy.6 Among the many reasons for low rates of screening, one major barrier is that current screening using a non-treponemal test requires a laboratory with trained personnel and a source of electricity to run a refrigerator to store the RPR reagent, a centrifuge to separate serum from whole blood, and a shaker to perform the serology. Since such facilities are generally not available in primary health care settings, blood or serum samples have to be transported to regional or central facilities for testing. Often results are only available days or weeks after testing. Studies have shown that only a small proportion of infected women receive treatment when RPR testing is performed off-site, because women do not return for their results or specimens or results are lost in transit.7 Even when testing is available at clinical sites, there are technical difficulties associated with maintaining trained personnel and assuring quality standards and supplies of tests and treatment.8

A number of simple, rapid treponemal tests have recently become commercially available. Most are “lateral flow” tests in which antibodies are transported by capillary flow over antigen immobilised on a nitrocellulose membrane strip (also termed immunochromatographic strips). Antibodies in the specimen become bound at the antigen site on the strip and are revealed with dye bound to an anti-immunoglobulin. These tests are simple, robust, affordable and can be stored and transported without need for refrigeration. Initial evaluations suggested that their performance was comparable with the best laboratory-based diagnostics.9–17 Used alone, they would be unable to distinguish active from cured disease but they can facilitate a crucial intervention—the screening of pregnant women to reduce the occurrence of stillbirth and congenital syphilis where access to laboratory services is a problem18,19 The WHO/Sexually Transmitted Diseases Diagnostics Initiative (SDI) is conducting an ongoing comprehensive evaluation of these rapid tests with panels of well-characterised archived serum specimens from geographically diverse settings. The results of these evaluations are used to select a number of the most promising tests for further evaluation in field settings. This paper reports the first results.


The initial phase of the work involved the recruitment of laboratories to undertake the evaluation and two reference laboratories to provide a measure of quality assurance. A request for applications was posted on the WHO/SDI website and laboratories on the SDI mailing list were contacted. Responding laboratories were sent a questionnaire to establish their access to patients, sera and a suitably constituted ethics committee, their general experience of evaluations and, importantly, their access to field sites for subsequent testing. In addition, the principal investigators were asked to submit 20 sera to the reference laboratories together with details of their results of both treponemal (Treponema pallidum particle agglutination assay (TPPA) or T pallidum haemagglutination assay (TPHA)) and non-treponemal (RPR) antibody tests. This was to establish that they were proficient in performing the reference tests used in the evaluation. The eight laboratories selected by this process are shown in table 1.

Table 1

 Sexually Transmitted Diseases Diagnostics Initiative (SDI) sites for laboratory-based evaluations of rapid syphilis diagnostics

Tests for evaluation

An ad hoc SDI expert working group for laboratory-based evaluations decided that the tests to be included should have the following characteristics:

  • rapid-test result is available in less than 30 minutes

  • simple test can be performed in a few steps, requiring minimal training and minimal extra equipment

  • easy-to-interpret card or strip format with visual readout.

In the initial round of evaluation, 13 manufacturers with tests that conform to the above characteristics were invited to participate, of whom six manufacturers submitted tests for evaluation at eight SDI sites on four continents. In the second round of evaluations, three more tests were evaluated in six of the SDI sites. One of the tests evaluated in the second round was an improved version of the test submitted in the first round (SyphiCheck made by the Tulip Group in India). Details of the tests are given in table 2 and their major features are listed in table 3. The serum panels used in round 2 were not identical to those used in round 1.

Table 2

 Syphilis tests evaluated and their manufacturers

Table 3

 The major features of the evaluated tests

Several parameters of the tests were evaluated including sensitivity and specificity relative to a “gold” or reference standard laboratory test together with inter-reader variability, result stability, reproducibility, ease of use and between-site variability. These laboratory comparisons represent the first part of a full evaluation of these tests, the final and definitive phase being the field evaluation.

Development of the standard protocol and performance of the evaluation

All participating laboratories collaborated in the development of a standard protocol for the evaluation which was then reviewed and approved by the WHO and the local site ethical committees. Before beginning the evaluation, the study protocol was piloted with one positive and one negative serum.

Each laboratory assembled an evaluation panel from archived specimens containing 50 TPHA/TPPA positive sera (40 RPR+, 10 RPR−) and 50 TPHA/TPPA negative sera (40 RPR−, 10 RPR+). Haemolysed sera were avoided and, if a precipitate was visible, the serum was clarified at 12 000 g for five minutes. All patient identifiers were unlinked from specimens before the evaluations.

The reference test was either the TPPA (Serodia, Fujirebio Inc, Tokyo) or the TPHA.

The standard operating procedures (SOP) for the assays were the manufacturer’s product inserts. In addition, SOPs were produced to ensure that the testers were blinded to reference standard results and that, in the inter-observer variability trial, both testers were truly independent.

Each kit was tested with all 100 sera in batches of 25 sera before evaluating another test kit to avoid comparison of results between kits. Indeterminate results were recorded as such and any repeat testing was only performed after all 100 sera had been tested. To allow result stability to be assessed, each result was read at the recommended time and after one hour. At each site, each test was read by two project technicians to allow inter-operator variability to be estimated.

Each test was assessed for its operational characteristics by the same technicians. Tests were scored for clarity of kit instructions, technical complexity or ease of use and ease of interpretation of results. Each of these characteristics was allotted marks out of 3 and an additional score of 1 was given to tests not requiring additional equipment, giving a maximum of 10. This was not done in the second round as it was felt that this would be better evaluated by field staff than highly trained laboratory technicians.

Test reproducibility was investigated in the reference laboratories. Lot-to-lot reproducibility was tested using 25 sera and two lots of each rapid test. Operator reproducibility was compared by two technicians who ran each test with the same 20 sera. Run-to-run reproducibility was investigated using nine sera that were tested on five successive days for each test.

Quality control measures were included in the data management instructions in the protocol. Results were recorded in the laboratory notebooks of each technician which was signed off by the supervisor at the end of each day. Data were then entered into a laboratory data collection spreadsheet provided by SDI. The spreadsheet was then double-checked against the notebooks of both technicians.

Data analysis

Sensitivities and specificities were calculated relative to the reference standard TPPA or TPHA results obtained for each serum specimen at each site and validated by the reference centres. Sample size calculations showed that the use of 600–800 sera, of which 50% are positive, would allow estimation of the sensitivity and specificity of the test with a 95% confidence interval of ±5%. No discrepant analyses were undertaken. The Breslow-Day test for homogeneity was used for determining site to site variation and κ values were calculated for each test as a summation of the overall performance (combined correlation of test sensitivity and specificity) of each test against the reference standard for all sites. A κ value of ⩾0.75 is considered excellent.

Interobserver variability was calculated as the number of tests for which different results are obtained by two independent different readers, divided by the number of specimens tested.


Round 1

Owing to insufficient quantities of characterised sera at some sites, the final results were available for 789 sera, 399 of which were TPHA or TPPA positive. (The requirement for biological false positive sera (TPPA−, RPR +) was a particular problem for some sites.) The sensitivity and specificity of each test for each site is shown in table 4. The overall sensitivity and specificity for the combined results with 95% confidence intervals are also shown.

Table 4

 Performance of rapid diagnostic tests for syphilis in round 1

The Fujirebio Espline, Abbott Determine, and Standard Bioline tests showed the highest sensitivity (97.7%, 97.2% and 95%, respectively; table 4). The sensitivities of these three tests were not significantly different from each other but were significantly different from those of the Diesse, Omega and Qualpro tests (p<0.03).

The Omega Visitect and the Qualpro Syphicheck tests showed the highest specificity (98% and 97.7%, respectively; table 4). These are not significantly different from each other but were significantly higher than the other four tests.

For estimation of overall test performance, the κ value was used. This determined the combined correlation of test sensitivity and specificity for all the sites against the reference standard. A κ value of 0.75 is considered excellent. Thus all the rapid tests had excellent correlation with the reference standard tests at each site, with κ values for the initial six tests ranging from 0.84–0.95.

Site-to-site variation for each test was measured using the Breslow-Day test for homogeneity of odds ratios. The three tests that gave the most variation were the Omega Visitect, the Abbott Determine, and the Diesse Syphilis Fast tests, with p values of 0.03, 0.0086 and 0.0002, respectively. There was no significant difference between malaria endemic and malaria-free sites with respect to test specificity.

Test results were stable after one hour for the Abbott Determine, Fujirebio, Qualpro Syphicheck and Omega Visitect tests with five or less results different from the original results. The Standard Bioline had 12 results different from the original with most of these becoming false-positive after one hour. The Diesse Syphilis Fast was affected by drying, making reading difficult after an hour. By the second reading, 22 results were different from the original test result, turning from negative to false-positive.

The scores for operational characteristics are summarised in table 5. The Abbott test obtained the best score (7.5 out of 10) with the Omega Visitect, Qualpro Syphicheck, Fujirebio Espline and Standard Bioline all less than 10% different from each other (6.5–7.1 out of 10). The Diesse Syphilis Fast test scored lowest (4.3 out of 10) on technical complexity and ease of interpretation.

Table 5

 Operational characteristics of rapid diagnostic tests for syphilis

The results for test reproducibility are summarised in table 6. Overall, the variability was low. The maximum observed was 10% for the Omega test for operator-to-operator variation in the reference laboratories. However, this test performed well for this parameter at the evaluation sites. Results of the first round of evaluations have been posted on the SDI website.

Table 6

 Test reproducibility

Round 2

A total of 600 sera from the six laboratory sites were used to evaluate a further three rapid syphilis tests; 299 were reference standard positives and 301 were reference standard negatives. The serum panels were not identical to those used in round 1. The performance data for these tests is summarised in table 7.

Table 7

 Performance of rapid diagnostic tests for syphilis in round 2

The CTK Syphilis On Site Rapid Q and the Qualpro Syphicheck WB tests showed the highest sensitivities (96.3% and 95.3%, respectively). These values were not significantly different from each other but there were significant differences between the sensitivities of the Bioline Syphilis anti-TP and CTK Syphilis On Site Rapid. The Bioline Syphilis Anti-TP test showed the highest specificity (97%). The only marginally significant difference was between the Qualpro Syphicheck WB and the Bioline Syphilis anti-TP. All other performance comparisons were not significantly different. The κ values for the three tests ranged from 0.89–0.92. Similarly, all three tests gave excellent values for the Breslow-Day test for site to site variation.

In round 2, the reproducibility testing was restricted to measuring lot-to-lot variations for two different lots using 20 serum samples. There was one discrepant result with the Qualpro Syphicheck-WB, two with the Bioline Syphilis anti-TP test and five with CTK’s Syphilis Rapid Screening Test.

In result stability testing after one hour the Qualpro test showed seven changes in result; five negative tests became false positive. The Bioline and the CTK tests were less stable with 12 and 14 changes, respectively. (In the Bioline test, nine became false positives and in the CTK test 10 became positive.)

Since all the lateral flow tests in round 1 were found to be simple in operation and the three tests in round 2 were essentially identical, the site technicians were not asked to score the operational characteristics of the tests in round 2. Given that these rapid tests are intended to be used in field settings, it was decided that the ease of operations would be better assessed in field settings.


Most of these rapid tests utilise one or more similar recombinant treponemal antigens, and it is likely that small differences in antigen concentration, detection system and the volume of serum used account for the small variations observed in performance. The nine rapid tests evaluated all showed good performance in terms of sensitivity and specificity relative to the reference standard TPHA or TPPA tests using archived serum specimens. As has often been noted in such trials of diagnostic tests, there was a trend towards an inverse relationship between sensitivity and specificity. Thus, for a diagnosis such as syphilis that can carry a risk of stigma for the patient, there may be an advantage in using a high specificity test to confirm diagnosis with a high sensitivity assay. There would also be an advantage for interpretation of disease status if the tests could be combined with a non-treponemal antibody assay. Although the RPR is simple enough to be performed under field conditions, it requires some laboratory equipment and refrigeration. In addition, there are problems with reading in differentiating between weak positive and negative reactions. Therefore, an anti-cardiolipin antibody test in a lateral flow format would be a substantial advance.

The multicentre design of this trial also allowed inter-site variation to be assessed. This also showed that the tests performed well in geographically distinct areas with differing patient populations, despite inevitable variations in the sera selected for each panel, test performance and reading, and the subjective nature of result interpretation.

These studies were able to detect significant variations in the stability of the results after one hour. This measurement was made to anticipate the use of these tests in a busy clinic setting where staff may not be able to read the tests at the manufacturers designated time. The Syphilis Fast latex agglutination assay particularly should be read after the recommended 8 minutes but, given its speed, this was not perceived as a problem. Similarly, all the lateral flow tests were perceived as very easy to use. The Syphilis Fast latex agglutination test was found to be marginally more difficult to do as sometimes the stick for stirring the reaction broke and it is also more difficult to interpret, especially when the reaction dried before the designated reading time. None of the tests were technically complex to perform and all tests were considered suitable for field use. The true evaluation of the operational characteristics of the tests will emerge from the subsequent field evaluations.

The SDI ad hoc expert working group considered which tests should be further evaluated in field settings after round 1 and felt that it was difficult to select one or two tests based on test performance characteristics alone. The final consensus was that the four rapid tests in round 1 that can use whole blood and do not require refrigeration should be taken forward to SDI field trials (see Mabey et al in this supplement). The three tests evaluated in round 2 also warrant field testing.

Given the simplicity and low cost of these rapid tests, it is hoped that they may prove to be effective tools in the control of syphilis and for screening pregnant women to prevent congenital syphilis in primary health care settings.


The authors wish to acknowledge the excellent technical assistance of all the laboratory staff who contributed to this evaluation. In particular, we thank K Eastick, L Stephenson and S Wu from the UK PHLS, Ramu Sarge-Njie from the Gambia site, Fazana Karim from the South Africa site, WH Wei and MQ Shi from the China site, Dean Mngara and Julius Mngara from the Tanzania site, and Grace Daniels and Paula B Dixon from the Alabama site, and the staff of the reference laboratories at the US Centers for Disease Control and Prevention and UK Public Health Laboratory Services in validating the qualification panels, repackaging and sending out the tests to all the sites, and performing the reproducibility testing.

SDI is indebted to members of the SDI ad hoc Laboratory Evaluation Expert Working Group (Ron Ballard, Carlos Conde, Alan J Herring, Edward W Hook, Robert Johnson, Laurie Markowitz, Milton R Tam) for their invaluable contribution towards the development of the SDI diagnostic evaluation scheme. The expert assistance of Mary Cheang and Jean Joly in data analyses, and of Izabela Suder-Dayao in coordinating this work and preparation of the laboratory reports, is much appreciated. This work is made possible with funding to the STD Diagnostics Initiative (SDI) from USAID and the Bill & Melinda Gates Foundation.

View Abstract


  • Competing interests: none declared

Request permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.