Location

https://ccse.kennesaw.edu/computing-showcase/cday-programs/spring2021program.php

Streaming Media

Document Type

Event

Start Date

26-4-2021 5:00 PM

Description

Covid-19 has been arguably the most impactful event in the past century. SARS-Cov-2 is a viral respiratory illness discovered in late 2019 that has spread to almost every country in the world. It has directly or indirectly affected just about everybody in the world greatly, causing over 117 million cases and 2.59 million deaths as of March 2021. This project has focused on the use of different types of linear regression to both analyze and predict Covid-19 infection data based on different features. First, simple linear regression was used to predict total deaths based on infections both globally and by country. Globally, the R^2$ score was .946, while depending on the country, the R^2$ score .984 which shows a very effective line of best fit. Second, polynomial regression (with a degree of 3) was used to predict total deaths based on total infections by country. This was much more effective, with R^2$ scores up to .9998. Finally, multiple linear regression was used with 9 features to find the best features to dive into with more detail. The four features selected from this analysis were GDP, Stringency Index, Median Age, and Life Expectancy. These features were analyzed for three countries in each continent to find patterns. It was found that in the three richest continents GDP and Stringency Index were all positive, while in the three poorest continents, the coefficients of these features were negative. This paper assumes basic conceptual knowledge of machine learning and should be readable by any upper level computer science undergraduate student.Advisors(s): Dr. Mohammed Aledhari maledhar@kennesaw.eduTopic(s): Data/Data AnalyticsCS 4267

Share

COinS
 
Apr 26th, 5:00 PM

UC-6 Covid-19 Data Analysis - Regression

https://ccse.kennesaw.edu/computing-showcase/cday-programs/spring2021program.php

Covid-19 has been arguably the most impactful event in the past century. SARS-Cov-2 is a viral respiratory illness discovered in late 2019 that has spread to almost every country in the world. It has directly or indirectly affected just about everybody in the world greatly, causing over 117 million cases and 2.59 million deaths as of March 2021. This project has focused on the use of different types of linear regression to both analyze and predict Covid-19 infection data based on different features. First, simple linear regression was used to predict total deaths based on infections both globally and by country. Globally, the R^2$ score was .946, while depending on the country, the R^2$ score .984 which shows a very effective line of best fit. Second, polynomial regression (with a degree of 3) was used to predict total deaths based on total infections by country. This was much more effective, with R^2$ scores up to .9998. Finally, multiple linear regression was used with 9 features to find the best features to dive into with more detail. The four features selected from this analysis were GDP, Stringency Index, Median Age, and Life Expectancy. These features were analyzed for three countries in each continent to find patterns. It was found that in the three richest continents GDP and Stringency Index were all positive, while in the three poorest continents, the coefficients of these features were negative. This paper assumes basic conceptual knowledge of machine learning and should be readable by any upper level computer science undergraduate student.Advisors(s): Dr. Mohammed Aledhari maledhar@kennesaw.eduTopic(s): Data/Data AnalyticsCS 4267