Bui Cong Giao *

* Corresponding author (bcgiao@sgu.edu.vn)

Main Article Content

Abstract

Similarity join over multiple time series is an interesting task of data mining. This task aims at identifying couples of similar subsequences from multiple time series and the two subsequences might have any length and be at any position in the time series. However, the task is extremely challenging since the computational time to search for couples of similar subsequences from two time series is very large. Moreover, the task needs to normalize two subsequences before conducting a distance measure on the normalized subsequences to consider the similar degree of the original subsequences. To address the problem, this paper proposes a method of similarity join over two time series under Dynamic Time Warping (DTW), supporting z-score normalization. The proposed method utilizes both a suite of state-of-the-art techniques for computing the DTW distance and a technique of incremental z-score normalization to reduce the computational costs. The method employs multithreading to improve runtime performance. If similar subsequences from two time series may not pair up because they are too far apart, the method might use a sliding window to constrain a scope for coupling similar subsequences. The experiments on the proposed method show that the method could return similar subsequences quickly and incur no false dismissals.

Keywords: Data normalization, Dynamic Time Warping, similarity join

Article Details

References

Berndt, D. J., & Clifford, J. (1994). Using Dynamic Time Warping to find patterns in time series. AAAI-94 Workshop on Knowledge Discovery in Databases (pp. 359-370). Seattle, Washington, USA: AAAI Press. doi:10.5555/3000850.3000887

Chatzigeorgakidis, G., Patroumpas, K., Skoutas, D., Athanasiou, S., & Skiadopoulos, S. (2018). Scalable hybrid similarity join over geolocated time series. Proceedings of the 26th ACM SIGSPATIAL International Conference on Advances in Geographic Information Systems, (pp. 119-128). Seattle Washington, USA. doi: 10.1145/3274895.3274949

Chen, Y., Chen, G., Chen, K., & Ooi, B. C. (2009). Efficient processing of warping time series join of motion capture data. 2009 IEEE 25th International Conference on Data Engineering (pp. 1048 - 1059). Shanghai, China: IEEE. doi:10.1109/ICDE.2009.20

Ding, H., Trajcevski, G., & Scheuermann, P. (2008). Efficient similarity join of large sets of moving object trajectories. 2008 15th International Symposium on Temporal Representation and Reasoning, (pp. 79-87). Montreal, QC, Canada. doi:10.1109/TIME.2008.25

Giao, B. C. (2022). Time-series datasets. doi:0.13140/RG.2.2.12512.15360

Giao, B. C., & Anh, D. T. (2016). Similarity search for numerous patterns over multiple time series streams under dynamic time warping which supports data normalization. Vietnam Journal of Computer Science, 3(3), 181-196. doi:10.1007/s40595-016-0062-4

Keogh, E., & Ratanamahatana, C. A. (2004). Exact indexing of Dynamic Time Warping. Knowledge and Information Systems, 7(3), 358-386. doi:10.1007/s10115-004-0154-9

Kim, S. W., & Park, S. (2001). An index-based approach for similarity search supporting time warping in large sequence databases. Proceedings of the 17th IEEE International Conferenceon Data Engineering, (pp. 607-614). Heidelberg, Germany. doi:10.1109/ICDE.2001.914875

Lemire, D. (2009). Faster retrieval with a two-pass dynamic-time-warping lower bound. Pattern Recognition, 42(9), 2169-2180. doi:10.1016/j.patcog.2008.11.030

Lian, X., & Chen, L. (2009). Efficient similarity join over multiple stream time series. IEEE Transactions on Knowledge and Data Engineering, 21(11), 1544-1558. doi:10.1109/TKDE.2009.27

Lin, Y., & McCool, M. D. (2010). Subseries join: A similarity-based time series match approach. 14th Pacific-Asia Conference, PAKDD 2010 (pp. 238-245). Springer Berlin Heidelberg. doi:10.1007/978-3-642-13657-3_27

Mollah, M., Souza, V., & Mueen, A. (2021). Multi-way time series join on multi-length patterns. 2021 IEEE International Conference on Data Mining (ICDM), (pp. 429-438). Auckland, New Zealand. doi:10.1109/ICDM51629.2021.00054

Mueen, A., Hossein, H., & Trilce, E. (2014). Time series join on subsequence correlation. Proceedings of 2014 IEEE International Conference on Data Mining (pp. 450-459). Shenzhen, China: IEEE. doi:10.1109/ICDM.2014.52

Pele, L. (2023). Currency converter using official exchange rates. https://fxtop.com/

Rakthanmanon, T., Campana, B., Mueen, A., Batista, G., Westover, B., Zhu, Q., . . . Keogh, E. (2012). Searching and mining trillions of time series subsequences under Dynamic Time Warping. Proceedings of the 18th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD '12) (pp. 262-270). Beijing, China: ACM. doi:10.1145/2339530.2339576

Vinh, V. D., & Anh, D. T. (2016). Efficient subsequence join over time series under Dynamic Time Warping. Recent Developments in Intelligent Information and Database Systems, (pp. 41-52). doi:10.1007/978-3-319-31277-4_4

Wang, J., Li, Q., Li, Z., Wang, P., Wang, Y., Wang, W., . . . Chi, M. (2019). Similarity join on time series by utilizing a dynamic segmentation index. Knowledge and Information Systems, 61(2019), 1517-1546. doi:10.1007/s10115-018-1317-4

Weigend, A. S. (2016). Time series prediction: Forecasting the future and understanding the past. http://www-psych.stanford.edu/~andreas/Time-Series/SantaFe.html. Accessed December 2016

Yeh, C. C. M., Zheng, Y., Wang, J., Chen, H., Zhuang, Z., Zhang, W., & Keogh, E. (2022). Error-bounded approximate time series joins using compact dictionary representations of time series. Proceedings of the 2022 SIAM International Conference on Data Mining (SDM), (pp. 181-189). Alexandria, VA, USA. doi:10.1137/1.9781611977172

Yeh, C. C. M., Zhu, Y., Ulanova, L., Begum, N., Ding, Y., Dau, H. A., . . . Keogh, E. (2018). Time series joins, motifs, discords and shapelets: a unifying view that exploits the matrix profile. Data Mining and Knowledge Discovery, 32(1), 83-123. doi:10.1007/s10618-017-0519-9

Zhu, Y., Zimmerman, Z., Senobari, N. S., Yeh, C.-C. M., Funning, G., Mueen, A., . . . Keogh, E. (2018). Exploiting a novel algorithm and GPUs to break the ten quadrillion pairwise comparisons barrier for time series motifs and joins. Knowledge and Information Systems, 54(2018), 203-236. doi:10.1007/s10115-017-1138-x