Перейти до основного вмісту
Clustering: Markov algorithm
Malyk Igor 1 , Knignitska Tetyana Vasylivna 1 , Gorbatenko Mykola Yuriyovych 2
1 Department of Mathematical Problems of Management and Cybernetics, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58000, Ukraine
2 Department of Mathematical Modeling, Yuriy Fedkovych Chernivtsi National University, Chernivtsi, 58000, Ukraine
Keywords: stochastic matrices, clustering of time series, unstructured data
Abstract

This paper deals with the problem of clustering on graphs based on the eigenvalues of the stochastic matrix of the graph. It is proved, that the eigenvalues of the stochastic matrix for large graphs $(N > 100)$ are divided into three groups, one of which is decisive for the number of clusters in the graph. Using the theory of random matrices, it was possible to show, that the asymptotic distribution of the subgroup of the real parts of the eigenvalues of the stochastic matrix of the graph is described by a semicircular Winger distribution, and the parameter of this distribution is $O (\frac{1}{\sqrt{N}})$. The use of stochastic matrices made it possible to accurately localize the eigenvalues responsible for the number of clusters, which was not possible for adjacency matrices. The basic assumptions of the model are related to the properties of Markov discrete chains, which allows to extend the scope of the obtained results to a wider class of objects. Theoretical results were verified by clustering time series describing the value of $N = 470$ shares of S&P500 data for the 6 years period (2013 – 2018). The clustering of these time series showed the results of the presence of 5 clearly defined groups, consistent with the data used. A covariance matrix with zeros on the diagonal elements was used to construct a stochastic matrix for time series. This approach made it possible to localize the main clusters more precisely. The main result of the work is devoted to asymmetric matrices with mathematical expectations not equal 0, which allowed to generalize some results of Big Data theory. Also, the dependence of clustering results on the cluster size, the number of clusters and the asymmetry of cluster size is noted in the paper. Using the Monte-Carlo method, it is shown that the proposed Markov algorithm is more stable to noise in the graph than some of classical algorithms.

References

[1] Bagherikaram G., Motahari A., Khandani A. The Secrecy Capacity Region of the Gaussian MIMO Broadcast Channel. Information Theory 2013, 59, 2673-2682. doi:10.1109/TIT.2012.2236972.

[2] Bai Z., Fang Z., Liang Y. Spectral Theory of Large Dimensional Random Matrices and Its Applications to Wireless Communications and Finance Statistics : Random Matrix Theory and Its Applications, World Scientific Publishing Company, 2014.

[3] Bender E.A., Williamson S.G. Lists, Decisions and Graphs - With an Introduction to Probability. University of California at San Diego, San Diego, 2010.

[4] Biely С., Thurner S. Random matrix ensembles of time-lagged correlation matrices: Derivation of eigenvalue spectra and analysis of financial time-series. Quantitative Finance 2008. 8, 705-722. doi:10.1080/14697680701691477.

[5] Bolch G., Greiner S., Meer H., Trivedi K. Queueing Networks and Markov Chains. John Wiley, 2nd edition 2006, 40 (5), 567-568. doi:10.1080/07408170701623187.

[6] Crisan D., Del Moral P., Lyons T. Discrete filtering using branching and interacting particle systems. Markov Processes and Related Fields 1999, 5 (4), 293-318.

[7] Dongen S. Graph Clustering by Flow Simulation. PhD thesis, University of Utrecht, 2000.

[8] Girvan M., Newman M. Community structure in social and biological networks. Proc. Natl Acad. Sci. USA 2002, 99 (12), 7821-7826. doi:10.1073/pnas.122653799

[9] Qiu R., Antonik P. Smart Grid using Big Data Analytics. A Random Matrix Theory Approach. Wiley Online Library, 2017.

[10] Knignitska T.V. Estimate of time series similarity based on models. Problems of Control and Informatics 2019, 51 (4), 94-104.

[11] Marchenko V.A., Pastur L.A. Distribution of eigenvalues for some sets of random matrices. Mat. Sb. N.S. 1967, 72 (114) (4), 507--536.

Cite
ACS Style
Malyk, I.; Knignitska , T.V.; Gorbatenko, M.Y. Clustering: Markov algorithm. Bukovinian Mathematical Journal. 2019, 7 https://doi.org/ https://doi.org/10.31861/bmj2019.02.059
AMA Style
Malyk I, Knignitska TV, Gorbatenko MY. Clustering: Markov algorithm. Bukovinian Mathematical Journal. 2019; 7(2). https://doi.org/ https://doi.org/10.31861/bmj2019.02.059
Chicago/Turabian Style
Igor Malyk, Tetyana Vasylivna Knignitska , Mykola Yuriyovych Gorbatenko. 2019. "Clustering: Markov algorithm". Bukovinian Mathematical Journal. 7 no. 2. https://doi.org/ https://doi.org/10.31861/bmj2019.02.059
Export
We use own, third-party cookies, and localStorage files to analyze web traffic and page activities. Privacy Policy Settings