A Comparison of Multiple Linear Regression and Random Forest Regression to Evaluate the Price of Residential Units (Case Study: North Valiasr, Tabriz)

Document Type : Research Paper

Authors

Abstract

One of the most fundamental aspects of any country is the housing economy. Because its price changes will cause numerous effects on the national economy in the short and long periods; therefore, it is essential to obtain a model which can assess housing prices. In this regard, the objectives of this research are to compare multiple linear regression and random forest regression to evaluate the price of residential units and extract the important factors in relation to the price of them.  The statistical population included four north valiasr neighborhoods (n=30,272 units) and the sample size was estimated to be 379 units using the cochran formula at 95% confidence level and with a 5% error. But 400 samples considered. To eliminate the effect of time, only the data of september, 2020 were used. Also, arcmap, spss and rstudio software were used to analyze the data. According to the results, area, apartment floors, construction year, proximity to health centers, urban facilities, green spaces, religious land-use, medical centers, military land-uses and floor level, are the ten most important variables in relation to housing prices in the north valiasr neighborhoods, respectively. Further, according to the findings, random forest regression has a superior capability in predicting housing prices in north valiasr of tabriz compared to multiple linear regression.

Keywords


Asghari Zamani, A., Rostaei, S., Koushesh Vatan, M. (2021). Evaluating the Land Subdivision of Residential and commercial Lands in terms of Land Subdivision Indicators and Land Stakeholders Case study: District 1 and 3 of Tabriz City. Geography and Planning, 24(74), 13-28. doi: 10.22034/gp.2021.10793.https://geoplanning.tabrizu.ac.ir/article_10793.html
Akbari N, Khoshakhlaq R, Mardiha S. (2013). Measurement and Valuation of Factors Affecting Housing Choice Using a Choice Experiment Method: Viewpoints of Households Living at Old Urban Textures of Isfahan. QJER. 13 (3):19-47. http://ecor.modares.ac.ir/article-18-10602-fa.html
Pour Mohammadi, M.R. (2000). Housing Planning, Tehran: Samat Publications.https://samta.samt.ac.ir/content/10806
Rostaei, S., Teimory, I., & Nemati, M. (2020). Assessment of Effective Factors on Urban House Prices Using Artificial Neural Network؛ Case Study: District 2 of Tabriz. Geography and Development Iranian Journal, 18(59), 129-148. doi: 10.22111/gdij.2020.5464.https://gdij.usb.ac.ir/article_5464.html
Amanpour S, Soleymani Rad E, Keshtkar L, Mokhtari Chelcheh S. (2015). Ahwaz Estimated Housing Prices Using Neural Networks. IUESA. 3 (9):45-57. http://iueam.ir/article-1-105-fa.html
Teimoori, D., Soltan gheys, N., Gholizadeh, Y. (2017). Estimation of Urban housing Price by Using Hedonic and Artificial Neural Networks; (Case Study Koye Valiaser, Tabriz). Geography and Territorial Spatial Arrangement, 7(22), 41-56. doi: 10.22111/gaij.2017.2995.https://gaij.usb.ac.ir/article_2995.html
Khalili Araghi M, Nobahar E. (2012). Predicting Housing Prices for the City of Tabriz: Application of the Hedonic Pricing and Artificial Neural Network Models. qjerp.19 (60):113-138. http://qjerp.ir/article-1-189-fa.html
Sadat Miraei, N. (2012). Pathology and Capacity Assessment of Urban Road Network with Crisis Management Approach, Case Study: Valiasr, Tabriz. M.Sc. Thesis, Payame Noor University of Tehran, Faculty of Social Sciences.https://ganj.irandoc.ac.ir/#/articles/f37a8cc7cb3aee98dc778e10abf45462
Shaqaqi, Hassan (2014). An Analysis of the Role and Position of Urban Street Lighting in Urban Planning, Case Study: Valiasr Alley and Abbasi Neighborhood in Tabriz, M.Sc. Thesis, University of Tabriz, Aras Campus.https://ganj.irandoc.ac.ir/#/articles/6990d8bf49347c04a209b6947441fd6e/search/97eed76bdbb89481a2484fdcc25eb85d
Ghorbani S, Afgheh S M. (2017). Forecasting the House Price for Ahvaz City: the Comparison of the Hedonic and Artificial Neural Network Models. IUESA. 5 (19):29-44.http://iueam.ir/article-1-738-fa.html
Koushesh Vatan, MA. (2019). A Comparative Study of the Role of Land Subdivision Quality in Optimal Land Use Productivity in Iranian Capitals, Case Study: Districts One and Three of Tabriz, M.Sc. Thesis, Tabriz University.https://ganj.irandoc.ac.ir/#/articles/0da0e12bdf45730035babcac7d8a2f9f/search/75bb541ae1e937a22ae06535f0b38925
Mokhber, Abbas (1984). Social Dimensions of Housing, translated by the Center for Socio-Economic Documents of the Program and Budget Organization, Tehran.https://www.gisoom.com/book/144209
Naghshe Mohit The Consulting Engineers Company. (2012). Tabriz city development plan, social studies report of the current stage.
Mirhashemi H, Yarahmadi D, Sharifi S, Farzin S. (2019). Improvement of Support Vector Machine and Random Forest Algorithm in Predicting Khorramabad River Flow Using Non-uniform De-Noising of data and Simplex Algorithm. jwmseir. 13 (47):40-51. http://jwmsei.ir/article-1-846-fa.html
Yazdani, Fardin; Eliassy, Tahmurth. (2001). Economic study of housing supply and demand in urban areas of Isfahan province, Management and Planning Organization of Isfahan province, Deputy of Planning and Coordination.
Abbot, M. L. (2017). Using Statistics in the Social and Health Sciences with SPSS and Excel.  John Wiley & Sons. DOI: 10.1002/9781119121077.
Abrougui, K. & Gabsi, K. & Mercatoris, B. & Khemis, C. & Amami, R. & Chehaibi, Sa. (2019). Prediction of organic potato yield using tillage systems and soil properties by artificial neural network and multiple linear regressions. Soil and Tillage Research. Vol. 190. pp 202-208. DOI: 10.1016/j.still.2019.01.011. 
Ahmad, Muhammad & Mourshed, Monjur & Rezgui, Yacine. (2017). Trees vs Neurons: Comparison between random forest and ANN for high-resolution prediction of building energy consumption. Energy and Buildings. 147. DOI: 10.1016/j.enbuild.2017.04.038.
Breiman, L. (2002). Manual on setting up, using, and understanding random forests v3.1.https://www.stat.berkeley.edu/~breiman/Using_random_forests_V3.1.pdf
Caffo, B. (2019). Regression Models for Data Science in R: A companion book for the Coursera Regression Models class. Leanpub.https://leanpub.com/regmods.
Case, B., Clapp, J., Dubin, R. Rodriguez, M. (2004). Modeling Spatial and Temporal House Price Patterns: A Comparison of Four Models. The Journal of Real Estate Finance and Economics 29, 167-191.https://doi.org/10.1023/B:REAL.0000035309.60607.53 .    
Case, K. & Quigley, J. & Shiller, R. (2005). «Comparing Wealth Effects: The Stock Market versus the Housing Market," Advances in Macroeconomics, Berkeley Electronic Press, vol. 5(1), pages 1-32. DOI: 10.3386/w8606.
Cozmei, C., & Onofrei, M. (2012). Impact of Property Taxes on Commercial Real Estate Competition in Romania, Journal of Procedia Economics and Finance, Vol3. Pages 604-610. https://doi.org/10.1016/S2212-5671(12)00202-X.
George, D. & Mallery, P. (2020). IBM SPSS Statistics 26 Step by Step: A Simple Guide and Reference. 16th Edition. Routledge: Taylor & Francis Group.https://www.routledge.com/IBM-SPSS-Statistics-26-Step-by-Step-A-Simple-Guide-and-Reference/George-Mallery/p/book/9780367174354.
Giussani, Andrea (2020), Applied Machine Learning with Python, Milano, EGEA S.p.A.https://books.google.com/books/about/Applied_Machine_Learning_with_Python.html?id=lN5AygEACAAJ.
Heyman, A., & Sommervoll, D.E. (2019). House prices and relative location. Cities, 95, 1-14. https://doi.org/10.1016/j.cities.2019.06.004.
Hong, Jengei & Choi, Heeyoul «Henry & Kim, Woo-sung. (2020). A house price valuation based on the random forest approach: the mass appraisal of residential property in South Korea. International Journal of Strategic Property Management. 24. 1-13.DOI: 10.3846/ijspm.2020.11544.
Jui J.J., Imran Molla M.M., Bari B.S., Rashid M., Hasan M.J. (2020) Flat Price Prediction Using Linear and Random Forest Regression Based on Machine Learning Techniques. In: Mohd Razman M., Mat Jizat J., Mat Yahya N., Myung H., Zainal Abidin A., Abdul Karim M. (eds) Embracing Industry 4.0. Lecture Notes in Electrical Engineering, vol 678. Springer, Singapore.https://doi.org/10.1007/978-981-15-6025-5_19.
Korosteleva, O. (2019). Advanced Regression Models with SAS and R. CRC Press.DOI: 10.1201/9781315169828.
Kotta H., Pardasani K., Pandya M., Ghosh R. (2021) Optimization of Loss Functions for Predictive Soil Mapping. In: Hassanien A., Bhatnagar R., Darwish A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore.https://doi.org/10.1007/978-981-15-3383-9_9 .
Lancaster K.J. (1966). A New Approach to Consumer Theory, Journal of Political Economy, Vol. 74. No. 2. Pages 132-157.doi.org/10.1086/259131.
Leung, C. (2004). Macroeconomics and housing: a review of the literature, Journal of Housing Economics, Vol.13. Pages 249-267.https://doi.org/10.1016/j.jhe.2004.09.002.
Liaw, Andy & Wiener, Matthew. (2001). Classification and Regression by RandomForest. RNews. 23.https://www.researchgate.net/publication/228451484_Classification_and_Regression_by_RandomForest.
Mahajan U., Krishnan A., Malhotra V., Sharma D., Gore S. (2021). Predicting Competitive Weightlifting Performance Using Regression and Tree-Based Algorithms. In: Hassanien A., Bhatnagar R., Darwish A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore.https://doi.org/10.1007/978-981-15-3383-9_36.
Makhija R., Ali S., Jaya Krishna R. (2021) Detecting Influencers in Social Networks Through Machine Learning Techniques. In: Hassanien A., Bhatnagar R., Darwish A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore.https://doi.org/10.1007/978-981-15-3383-9_23.
Čeh Marjan & Kilibarda, Milan & Lisec, Anka & Bajat, Branislav. (2018). Estimating the Performance of Random Forest versus Multiple Regression for Predicting Prices of the Apartments. ISPRS International Journal of Geo-Information. 7. 168. DOI: 10.3390/ijgi7050168.
Masri, Hilmi & Nawawi, Abdul & Sipan, Ibrahim. (2016). Review of Building, Locational, Neighbourhood Qualities Affecting House Prices in Malaysia. Procedia - Social and Behavioral Sciences. 234. 452-460. DOI: 10.1016/j.sbspro.2016.10.263.
Mayer, A. (2013). Introduction to Statistics and SPSS in Psychology, Edinburgh: Pearson Higher Ed.https://books.google.com/books/about/Introduction_to_Statistics_and_SPSS_in_P.html?id=SDHtMgEACAAJ.
Nair S.N., Gopi E.S. (2020) Deep Learning Techniques for Crime Hotspot Detection. In: Kulkarni A., Satapathy S. (eds) Optimization in Machine Learning and Applications. Algorithms for Intelligent Systems. Springer, Singapore.https://doi.org/10.1007/978-981-15-0994-0_2.
Pal, M. & Bharati, P. (2019). Applications of Regression Techniques. Springer Singapore.https://www.springer.com/gp/book/9789811393136.
Richardson, R. (2015). Business Applications of Multiple Regression, Second Edition - Quantitative approaches to decision making collection. Business Expert Press.https://www.businessexpertpress.com/books/business-applications-multiple-regression-second-edition.
Schulz, R., Werwatz, A. (2004). A State Space Model for Berlin House Prices: Estimation and Economic Interpretation. The Journal of Real Estate Finance and Economics 28, 37-57.https://doi.org/10.1023/A:1026373523075.
Selim, H. (2009). Determinants of House Prices in Turkey: Hedonic Regression versus Artificial Neural Network, Journal of Expert Systems with Applications, Vol.36. Issue 2. Part 2. Pages 2843-2852.doi.org/10.1016/j.eswa.2008.01.044.
Sharma R., Rani S. (2021) A Novel Approach for Smart-Healthcare Recommender System. In: Hassanien A., Bhatnagar R., Darwish A. (eds) Advanced Machine Learning Technologies and Applications. AMLTA 2020. Advances in Intelligent Systems and Computing, vol 1141. Springer, Singapore. https://doi.org/10.1007/978-981-15-3383-9_46.
Sirmans, G., MacPherson, D., & Zietz, E. (2009). The Composition of Hedonic Pricing Models. Journal of Real Estate Literature, 13, 3-43. https://www.semanticscholar.org/paper/The-Composition-of-Hedonic-Pricing-Models-Sirmans MacPherson/2fde22eba11fad9e671eacb588353f123df0d3ac 
Sullivan, W. (2017). Machine learning Beginners Guide Algorithms: Supervised & Unsupervised learning, Decision Tree & Random Forest Introduction. Healthy Pragmatic Solutions Inc.https://books.google.com/books/about/Machine_Learning_For_Beginners_Guide_Alg.html?id=v6saxAEACAAJ&source=kp_book_description
Truong, Quang & Nguyen, Minh & Dang, Hy & Mei, B. (2020). Housing Price Prediction via Improved Machine Learning Techniques. Procedia Computer Science. 174. 433-442.DOI: org/10.1016/j.procs.2020.06.111.
Varma, A., & Sarma, A., & Doshi, S., & Nair, R. (2018). House Price Prediction Using Machine Learning and Neural Networks, 2018 Second International Conference on Inventive Communication and Computational Technologies, Coimbatore, pp. 1936-1939.DOI: org/10.1109/ICICCT.2018.8473231.
Wang, C. & Wu, H. (2018). A new machine learning approach to house price estimation. New Trends in Mathematical Science. 4. 165-171. DOI: 10.20852/ntmsci.2018.327.
Wlodarczak, Peter (2020). Machine Learning and its applications, First edition. Boca Raton, FL: CRC Press/Taylor & Francis Group.DOI: 10.1201/9780429448782.
Yeşilkanat, C. (2020). Spatio-temporal estimation of the daily cases of COVID-19 in worldwide using random forest machine learning algorithm. Chaos, Solitons & Fractals. 140. 110210.DOI: 10.1016/j.chaos.2020.110210.