COVID-19 infections in the US nearly three times as reported, model estimates

Newswise – DALLAS – February 8, 2021 – World health experts have long suspected that the incidence of COVID-19 is higher than reported. Now a machine-learning algorithm developed at UT Southwestern estimates that the number of COVID-19 cases in the US since the start of the pandemic is nearly three times that of confirmed cases.

The algorithm, described in a study published today in PLOS ONE, provides daily updated estimates of the total number of infections to date, as well as the number of people currently infected in the US and 50 countries most affected by the pandemic.

By February 4, according to the model’s calculations, more than 71 million people in the US – 21.5 percent of Americans – had contracted COVID-19. That’s comparable to the significantly smaller 26.7 million publicly reported number of confirmed cases, says Jungsik Noh, Ph.D., a UT Southwestern assistant professor in the Lyda Hill Department of Bioinformatics and lead author of the study.

Of those 71 million Americans estimated to have had COVID-19, 7 million (2.1 percent of the U.S. population) had current infections and, according to the algorithm, were potentially contagious on Feb. 4.

Noh’s written study is based on calculations completed in September. At the time, it reports, the number of actual cumulative cases in 25 of the 50 worst-affected countries was five to 20 times greater than the confirmed cases then suggested.

Looking at the information currently available about the online algorithm, the estimates are now closer to the reported numbers – but still much higher. On February 4, Brazil had more than 36 million cumulative cases as estimated by the algorithm, nearly four times more than the 9.4 million confirmed cases reported. France had reported 14 million against 3.2 million. And the UK had nearly 25 million instead of about 4 million – more than six times that. Mexico, an outlier, had nearly 15 times more reported cases – 27.6 million as opposed to 1.9 million confirmed cases.

“Estimates of actual infections reveal for the first time the true severity of COVID-19 in the US and countries around the world,” said Noh.

The algorithm uses the number of reported deaths – which is believed to be more accurate and complete than the number of laboratory-confirmed cases – as the basis for its calculations. It then assumes an infection death rate of 0.66 percent, based on a previous study of the pandemic in China, and takes into account other factors such as the average number of days from symptom onset to death or recovery. It also compares its estimate to the number of confirmed cases to calculate a ratio of confirmed to estimated infections.

There is still a lot of uncertainty about COVID-19 – especially the death rate – and the estimates are therefore rough, Noh says. But he believes the model’s estimates are more accurate, omitting fewer cases than the confirmed ones currently used as a guideline for public health policy. Having a more comprehensive estimate of the disease’s prevalence is important, Noh adds.

“These are critical statistics about the severity of COVID-19 in each region. Knowing the true severity in different regions can effectively fight the spread of the virus, ”he explains. “The currently infected population is the cause of future infections and deaths. Actual size in a region is a critical variable required in determining the severity of COVID-19 and developing strategies against regional outbreaks. “

In the US, infection rates vary widely by state. California has had nearly 7 million infections since the start of the pandemic, compared to New York’s 5.7 million, according to the algorithm’s projections for Feb. 4. Also, the model estimated that California had 1.3 million active cases on that date, which is 3.4 percent of the state’s population. .

Other Model Estimates for Feb. 4: In Pennsylvania, 11.2 percent of the population had current infections – the highest rate of any state, compared to a low of 0.15 percent for Minnesota residents; In New York, an early hot spot, 528,000 people had active infections, or about 2.7 percent of the population. Meanwhile, in Texas, 2.3 percent had current infections.

Noh says he developed the algorithm last summer while trying to decide whether to personally send his sixth-grade daughter back to school. The data he needed to measure its safety was nowhere to be found, he says.

Once he built the machine algorithm, he found that the area where he lived had an infection rate of about 1 percent. So his daughter went to school.

Noh verified his findings by comparing his results with existing prevalence rates found in several studies using blood tests to check for antibodies to the SARS-CoV-2 virus, which causes COVID-19. For most of the areas tested, his algorithm’s estimates of infections closely matched the percentage of people who tested positive for the antibodies, according to the PLOS ONE study.

The online model uses COVID-19 death records from Johns Hopkins University and The COVID Tracking Project, a volunteer organization created to help track COVID-19, to perform daily updates. The estimates published in the PLOS ONE study dates from September 3. At the time, about 10 percent of the US population was infected with COVID-19, based on Noh’s algorithm.

Gaudenz Danuser, Ph.D., chair of the Lyda Hill Department of Bioinformatics and professor of cell biology, was the study’s senior author. He also holds the Patrick E. Haggerty Distinguished Chair in Basic Biomedical Science.

Funding came from Lyda Hill Philanthropies.

.Source