Developing a Text Mining Model in Persian News Websites on the Iranian Capital Market
Main Article Content
Abstract
The critical role of financial markets in the dynamics of economics has made the need to study market trends a necessity and fascinates every investor and economist; Given that the capital market is influenced by multiple factors, it is very difficult to accurately predict the course of its changes. On the other hand, the prevailing uncertainty in the capital market enforces the role of media, especially cyber media, in directing investors. By publishing news about the market's movement and stocks on the web on a large scale, these media, directly or indirectly, encourage their audiences to buy/sell a specific share. In this article, we have tried to analyze the feeling of news published on the Persian websites through text mining patterns. In this regard, after extracting the news through the web crawler, the related news texts were separated through a query. The present study was conducted with the aim of analyzing the feeling of news published on Persian websites through text mining patterns and models. In this regard, after extracting the published news through the web crawler, the related news texts were separated through queries. Then, the number of stock exchange symbols and their corresponding industrial groups were counted mechanically. 9985 news text was manually tagged, and specialized datasets were created. Data preprocessing was performed through the BeautifulSoup and Hazm Python libraries. Vectorization was made by the WordPiece algorithm. Finally, emotion analysis was performed by the parsBERT algorithm and three-class emotion analysis (e.g., positive, negative, and neutral). The accuracy of the model, assessed by the F1-score, precision, accuracy, and recall criteria in the Google Kolab and Jupyter Notebook platforms, was 83.78. Comparing the suggested model with those introduced in former studies, clarified that our model could analyze the feeling of capital market news published in cyber media with acceptable accuracy.