Comparing CTIndex Devices
BY Nathan Moore
Imagine a scenario where a contractor works on a balanced mix design (BMD) and finally gets that mix to pass an agency’s IDEAL-CT criteria. Then, after submitting specimens to the agency for approval, the agency’s results fail the mix design criteria. What happened? When the contractor and agency discuss the results, they realize both sets of specimens were tested with machines from different manufacturers.
Could using different devices be the reason for the discrepancy between the two labs? How can we ensure that different machines will provide equivalent results? A recent National Center for Asphalt Technology (NCAT) study could help resolve this situation. In the study, six different devices were evaluated to assess how much they could affect the overall variability of IDEAL-CT results.
Variability in IDEAL-CT test results can come from many sources: operator, materials, specimen preparation, equipment differences, etc. Specimen preparation is known to have a large effect on IDEAL-CT results. For this study, careful attention to detail was given to making the specimens by using a single technician, using the same specimen preparation equipment and oven heating times, and by randomizing the specimens to be tested among the six devices.
When investigating differences in test results due to devices, analysis should include data from a variety of mixtures with results ranging from low to high. In this study, eight replicates from seven different mixtures were tested on each device by the same technician. Due to natural variability, results for each mixture will differ from device to device. Although replicates from a mix may be repeatable within the specific devices, there still may be differences when comparing results between devices. The concern is when one device consistently yields results that are higher or lower than another device. We want to know how much of a difference can be tolerated.
Statistical equivalence is not a term used often in materials testing. This is the idea that results are considered equivalent when the differences between them are practically irrelevant. For example, if Sample A has an average CTIndex of 95 and Sample B has an average CTIndex of 93, is this two-unit difference large enough to be considered important given the test’s variability? To establish a limit for acceptable difference, the analysis used the average within-lab variability—measured in percent coefficient of variation (COV)—from the 2018 NCAT Round Robin Study, where the average within-lab COV was 18%. Therefore, when two devices can consistently produce mean results less than or equal to 18%, they should be considered equivalent.
Two key findings came from this study.
Some devices had average loading rates faster than the 50 ± 2 mm/min currently specified in ASTM D8225. All the measured speeds from over 300 IDEAL-CT tests fell between 49 and 53 mm/min. Thus, although all the rates do not meet the standard’s specified range of 48-52 mm/min, the devices are still operating within the maximum allowable 4 mm/ min tolerance window. There was no discernable effect of speed on the final CTIndex results for each mix because the devices were operating similarly.
Using the Two One-Sided Test (TOST) equivalence test, all but one of the devices were found to provide equivalent results. When the specific manufacturer was made aware of this issue, a flaw in the data collection system was found and the issue was corrected. Although there were differences up to 5 CTIndex units present in the final comparisons, these differences were not large enough to be considered relevant given the variability of the test. Thus, the study indicates that different devices can be trusted to yield equivalent results.
The findings of the current study do not mean that differences won’t occur, and it’s important to investigate large variances between comparison testing results. Following preparation and testing best practices will greatly reduce the chances of having a wide range between specimens from the same mix sample. It is highly recommended that when specimens are to be tested between two different devices, they should be prepared at the same time and under the same conditions to reduce variability between the split samples.
A deeper study at NCAT is planned to better quantify some of the effects of improper specimen preparation or specimen ambient aging—among other unintended errors—on the variability of the test. At this time, the biggest differences in test results stem from specimen preparation. By the time the specimens have been made and are ready for testing, the variability from specimen preparation has already been introduced and can’t be removed. This is not to say ambient aging or time after compaction before the test is run has no role in variability. It would be prudent to test all portions of a split sample as close as possible to the same day. This, too, is a variable that hopefully will be assessed in a future study. It would be useful to the industry to know if testing three days after splitting will yield equivalent results to testing “next-day.”
The best way to avoid the problem posited in the first two paragraphs above is to make all the specimens from the same mix sample at the same time, then randomly split them, then test them at the same time to avoid any “ambient aging.” This does not suggest matching testing time to the nearest minute, but instead suggests performing tests within a few hours of each other. By eliminating as much of the sample prep differences as possible, the contractor and agency set themselves up for matching test parameters.
Nathan Moore is an assistant research engineer at the National Center for Asphalt Technology. For more information, contact him at email@example.com. This article is reprinted, with additions and with kind permission, from the Spring 2022 NCAT Asphalt Technology News.