IJCRR - 2nd Wave of COVID-19: Role of Social Awareness, Health and Technology Sector, June, 2021
Date of Publication: 11-Jun-2021
Download XML Download PDF
Forecasting of Covid-19 Cases in India by Time Series Analysis Using Autoregressive Integrated Moving Average Model
Author: Khobragade Ashish, Kadam Dilip
Abstract:Introduction: COVID-19 is caused by SARS-CoV-2, a coronavirus. Forecasting has an important role in the surveillance of new emerging diseases like COVID-19. Objective: The objective of the study was to forecast COVID-19 cases by using the ARIMA model. Methods: We have used the ARIMA model to forecast cases of COVID-19 occurring per day in India. A total of 50 observations were used to fit the model. Model is best fitted by using order (0,2,1) which has the lowest AIC value. Forecasted values were compared with actual values. Results: We have found that actual reported cases per day were within 95% CI of forecasted values. Conclusions: ARIMA model can be used to forecast over a short period. This model can be used to develop strategies for the containment of pandemics.
Keywords: ARIMA, COVID-19, Forecast, India, Model, Time series analysis
The first confirmed case of COVID-19 was reported from Wuhan city of China. Soon cases started spreading in China and nearby countries. It was announced as public health emergency of international concern (PHEIC) by the World Health Organization (WHO) in January 2020.1 Alert was also given by the WHO regarding the global spread of the disease in March 2020.1More than 151.8 million cases and 31.86 lakh deaths were reported from all over the world as on 2nd May 2021. Total 19.8 million confirmed cases and 2.18 lakh deaths have occurred in India as of 2nd May 2021.2
If we can forecast the new cases which will occur in near future, it will aid in planning the resources required to prevent the occurrence of further cases. Many forecasting models were tried in the case of COVID-19 all over the world and in India.3,4,5 In this study, we have tried a forecasting model based on autoregressive integrated moving average (ARIMA).
1. To develop a time series analysis (tsa) forecasting model of COVID-19 cases in India
2. To compare the actual cases of COVID 19 that occurred in India with the forecasted model.
The COVID-19 cases occurring daily in India were reported to the Government of India which has, in turn, appeared in the official dashboard. For the study purpose, the number of COVID-19 cases occurring daily from 17th February to 7th April 2021 per day in India was considered.2,6 The data is available freely on the Government of India portal. This data was extracted in an excel sheet date wise. This data for 50 days was used to predict the occurrence of new cases of COVID-19 for the next five days. The epidemic curve was plotted for the selected period. [Figure 1]
In time series analysis, an ARIMA model is used to forecast future data based on the past available data. The Time series model is forecasted using the ‘Forecast’ package in R software. Fifty data points (daily COVID-19 cases) were first converted to time series data using the ‘ts’ command by taking the start point and endpoint of the data. The frequency was taken as 365.25 as data was related to daily COVID cases. Non-seasonal ARIMA models are generally denoted by ARIMA order (p,d,q). This order has three components to forecast the model: p, d and q, where p= number of autoregressive terms, d=order of differencing and q=number of lagged forecast errors in prediction.
The pre-requisite to apply the ARIMA model on time series data is that time-series data should be stationary. Time series data is called stationary when it's mean; variance and autocorrelation are constant over some time. Hence, the Augmented Dickey-Fuller (ADF) test was applied on time series data to check for its stationarity. Differencing was done twice to make time-series stationary. Consecutive numbers were subtracted twice for second-order differencing. In this way by differencing, we have removed the trend and seasonality of the data and hence, the mean of the time series is now constant. The stationarity of the data was confirmed by conducting the ADF test again. If the p value is less than 0.05, the data is considered stationary.
The order of the ARIMA model was selected by plotting autocorrelation (ACF) and partial autocorrelation (PACF) graphs. P-value was obtained from pacf graph and q value from acf graph.[Figure 2 and 3] ARIMA model was fitted by using the auto Arima function. Akaike’s information criterion (AIC) was used for fitting the best model. Order having the lowest AIC value was selected as the best fitting model. Forecasting was done by using the fitted model for the next 5 days. The actual number of COVID-19 cases that occurred during this period was compared with forecasted data. The predicted model for the next 30 days was plotted graphically.
Statistical analysis: Statistical analysis was done by using R software version 3.6.1 using the ‘Forecast’ package.
Results: Original data was tested for its stationarity using Augmented Dickey-Fuller (ADF) test and the test results is as follows
ADF= 0.92, Lag order = 3, p-value = 0.99.
The data is not stationary as the ADF test p-value is more than 0.05. Hence, we have differenced the time series twice to make it stationary. After making differencing, ADF test results are as follows
ADF= -4.78, lag order=3, p value= 0.01. (p value < 0.05 is considered as significant)
The best fitted ARIMA model is (0,2,1) with the lowest AIC value of 965.79. The moving average (ma) coefficient for the fitted model is -0.8796 with a standard error of 0.0649. [Table 1]
The output of the forecasted model for the next 5 days is shown. The actual and predicted cases from 8th to 12th April 2021 is shown in table No.1. All the actual cases are within the range of 95% confidence interval of the predicted cases. [Table 2] Also predicted cases for the next 30 days (from 8th April to 7th May 2021) are plotted graphically [Figure 4].
In this study, we have used the ARIMA model to forecast cases of COVID-19. Without forecasting, it is very difficult to plan the strategies for the surveillance of the disease. When we have a forecasted data, public health surveillance can be carried out in the right direction and inculcate correct intervention measures. Hence, we have planned to do a time series analysis of COVID-19 cases to prove the hypothesis of whether these COVID-19 cases follows time series or not and to forecast the future trends.
We have taken COVID-19 cases that occurred in India as time series data of 50 days. As the data was not stationary, we have done differencing twice to take it stationary. We have used the ARIMA model to forecast using R software. The fitted model for that time series data in order (0,2,1). The AIC and BIC value is lowest for this order. ADF test was used to check stationarity. We have forecasted data for the next 5 days from 8th April to 12th April 2021. We have found that all of the actual cases reported are within the 95% confidence interval of the forecasted cases.
Different ARIMA models are fitted for different countries. In Saudi Arabia, the preferable ARIMA model is (2,1,1).7 Best fitted model for various countries are Italy (0,2,1), Spain (1,2,0) and France (0,2,1).8 Kabir et al. developed the ARIMA model for Nigeria. They used 39 observations to predict the corona cases.9 We have used 50 days of data to predict the daily cases of COVID-19. Amal et al. forecasted the cases for 10 days using ARIMA and NARANN model. In this study, they used only 1-month of data to predict the cases.10A model to predict cases and deaths were developed in Italy to predict. In this study, the model was fitted by using order (0,2,0) and (2,2,1).11 Similarly, one study forecasted COVID-19 cases for two days using ARIMA model of order (1,2,0) and (1,0,4).12 In our study, ARIMA model is best fitted by using order (0,2,1). Actual cases reported for the next 5 days are within a 95% confidence interval of the predicted values.
Previously many time series models were used for forecasting infectious disease surveillance. Public health experts can predict how much variability will be there in future regarding the pattern of the disease.13 Cases should be updated regularly so that if there is any change in the time trend of the disease, it will reflect in the model. The model will give a good prediction of the future trend of the disease. ARIMA model can be used for epidemiological surveillance of the new emerging diseases like COVID-19. So that correct intervention can be done at the correct time to prevent morbidity and mortality from the disease.
Conclusion: Actual cases of COVID-19 are within 95% CI of the predicted ARIMA model (0,2,1). ARIMA model can accurately predict the occurrence of COVID-19 cases. This model may be developed for state & district levels to predict COVID-19 cases.
Recommendation: Forecasting must be a part of routine surveillance activities in the pandemic situation of new emerging diseases like COVID-19.
Acknowledgement: We express gratitude to the Ministry of Health & Family Welfare, Government of India for the supporting data, used in this research.
Conflict of interest: None
Source of funding: Nil
Author’s contributions: Both authors have conceptualized the article. The manuscript was written by the 1st author and edited by the 2nd author. Data analysis was done by the 1st author.
WHO. Timeline: WHO COVID-19 response. Available from https://www.who.int/emergencies/diseases/novel-coronavirus-2019/interactive-timeline
WHO. Coronavirus (COVID-19) Dashboard. Available from https://covid19.who.int/
Aravind M, Srinath K, Maheswari N, Sivagami M. Predicting COVID-19 Cases in the Indian States using Random Forest Regression. Int J Cur Res Rev. 2021;3:109-114.
Theerthagiri P, Jacob JI, Ruby UA, Yendapalli V. Prediction of COVID-19 Possibilities using K-Nearest Neighbour. Class Int J Curr Res Rev. 2017;3:156-164.
Yash S, Shikhar B, Parvathi R. Covid-19 Forecasting and Analysis Using Different Time-Series Model and Algorithms. Int J Cur Res Rev.2021;23: 184-189.
MOHFW, Govt. of India. COVID-19 status. Available from https://www.mohfw.gov.in/
Alzahrani SI, Aljamaan IA, Al-Fakih EA. Forecasting the spread of the COVID-19 pandemic in Saudi Arabia using the ARIMA prediction model under current public health interventions. J Infect Public Health [Internet]. 2020;13(7):914–9. Available from: https://doi.org/10.1016/j.jiph.2020.06.001
Ceylan Z. Estimation of COVID-19 prevalence in Italy, Spain, and France. Sci Total Environ. 2020:10;(7):138-148. Available from: https://doi:10.1016/j.scitotenv.2020.138817
Abdulmajeed K, Adeleke M, Popoola L. Online Forecasting of Covid-19 Cases in Nigeria Using Limited Data. Data Br. 2020; 30:105683. Available from: https://doi.org/10.1016/j.dib.2020.105683
Saba AI, Elsheikh AH. Forecasting the prevalence of COVID-19 outbreak in Egypt using nonlinear autoregressive artificial neural networks. Process Saf Environ Prot. 2020 Sep;141:1-8. doi: 10.1016/j.psep.2020.05.029.
Yang Q, Wang J, Ma H, Wang X. Research on COVID-19 based on ARIMA modelΔ-Taking Hubei, China as an example to see the epidemic in Italy. J Infect Public Health. 2020 Jun 20:S1876-0341. doi: 10.1016/j.jiph.2020.06.019.
Benvenuto D, Giovanetti M, Vassallo L, Angeletti S, Ciccozzi M. Application of the ARIMA model on the COVID-2019 epidemic dataset. Data Br. 2020; 29:105340. Available from: https://doi.org/10.1016/j.dib.2020.105340
Allard R. Use of time-series analysis in infectious disease surveillance. Bull World Health Organ. 1998;76(4):327–33.