Al Maruf Hassan , Huu-Hoa Nguyen * , Md. Maruf Hassan , Abdul Kadar Muhammad Masum and Dewan Md. Farid

* Corresponding author: Huu-Hoa Nguyen (email: nhhoa@ctu.edu.vn)

Main Article Content

Abstract

Ensemble clustering leverages multiple methods to identify diverse patterns and, instead of depending on a singular approach, generates a more dependable and accurate clustering solution. This methodology mitigates bias and noise in intricate, high-dimensional data, allowing the grouping of biological and genomic big data. Component-based ensemble clustering divides data into subsets, applies several algorithms, and then aggregates the outcomes to increase performance. This method analyzes each data subset independently, facilitating the recognition of various patterns while minimizing noise and bias. This paper proposes two novel clustering methods that integrate multiple algorithms, including Agglomerative Hierarchical Clustering (AHC), K-Means Clustering, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS), Improved Density-Based Spatial Clustering of Applications with Noise (IDBSCAN), and Density-Based Spatial Clustering of Applications with Noise Plus Plus (DBSCAN++). The second method, termed Ensemble Clustering with Each Subset (ECES), employs both ‘with-replacement’ and ‘without-replacement’ techniques to increase variety, minimize redundancy, and improve generalization. The key distinction resides in the ensemble step of the second strategy, which divides datasets into equal subsets to ensure fairness and comparability. This ensures fairness, comparability, and controlled diversity within the ensemble, reducing bias, redundancy, and overlap.

Keywords: Clustering, component-based clustering, ensemble clustering

Article Details

References

Chen, M. S., Lin, J. Q., Wang, C. D., Huang, D., & Lai, J. H. (2025). Contrastive Ensemble Clustering. IEEE Transactions on Neural Networks and Learning Systems.

Farid, D. M., Nowe, A., & Manderick, B. (2016). An ensemble clustering for mining high-dimensional biological big data. International Journal of Design & Nature and Ecodynamics, 11(3), 328-337.

Hong, X., Li, H., Miller, P., Zhou, J., Li, L., Crookes, D., ... & Zhou, H. (2019). Component-based feature saliency for clustering. IEEE transactions on knowledge and data engineering, 33(3), 882-896.

Hu, G., & Rezaeipanah, A. (2025). Noise-robust semi-supervised clustering learning framework considering weighted consensus and pairwise similarities. Neurocomputing, 630, 129700.

Li, J. (2010). Ensemble clustering via heuristic optimisation (Doctoral dissertation). Brunel University, School of Information Systems, Computing and Mathematics.

Li, S., Zhao, P., Wang, H., Wang, H., & Li, T. (2025). Neighbor self-embedding graph model for clustering ensemble. Applied Soft Computing, 171, 112844.

Liu, Y., Li, S., & Tian, W. (2021, May). Autocluster: Meta-learning based ensemble method for automated unsupervised clustering. In Pacific-Asia Conference on Knowledge Discovery and Data Mining (pp. 246-258). Cham: Springer International Publishing.

Mahmud, M. S., Zheng, H., Garcia-Gil, D., Garcia, S., & Huang, J. Z. (2025). RSPCA: Random Sample Partition and Clustering Approximation for ensemble learning of big data. Pattern Recognition, 161, 111321.

Ren, S., Zhang, X., Li, H., Hu, C., & Chen, D. (2025). Advanced analysis of defect clusters in nuclear reactors using machine learning techniques. Scientific Reports, 15(1), 22439.

Shang, Z., Dang, Y., Wang, H., & Liu, S. (2025). Representative Point-Based Clustering With Neighborhood Information for Complex Data Structures. IEEE Transactions on Cybernetics.

Tian, H. P., & Zhang, Z. (2025). Partial distance evidential clustering for missing data with multiple imputation. Knowledge-Based Systems, 310, 112948.

Vukicevic, M., Radovanovic, S., Delibasic, B., & Suknovic, M. (2016). Extending meta-learning framework for clustering gene expression data with component-based algorithm design and internal evaluation measures. International Journal of Data Mining and Bioinformatics, 14(2), 101-119.

Wei, Z., Wang, J., Zhao, Z., & Shi, K. (2025). Toward data efficient anomaly detection in heterogeneous edge–cloud environments using clustered federated learning. Future Generation Computer Systems, 164, 107559.

Xu, P., Gao, H., & Wang, Y. (2025). Sparse dual-weighting ensemble clustering. Cluster Computing, 28(2), 119.

Yang, C. H., Lee, B., Lee, Y. I., Chung, Y. F., & Lin, Y. D. (2025). An autoencoder-based arithmetic optimization clustering algorithm to enhance principal component analysis to study the relations between industrial market stock indices in real estate. Expert Systems with Applications, 266, 126165.

Yu, Z., Zheng, X., Sun, J., Zhang, P., Zhong, Y., Lv, X., ... & Yang, J. (2025). Critical factors influencing live birth rates in fresh embryo transfer for IVF: insights from cluster ensemble algorithms. Scientific Reports, 15(1), 3734.

Zhan, S., Jiang, H., & Shen, D. (2025). Co-regularized optimal high-order graph embedding for multi-view clustering. Pattern Recognition, 157, 110892.

Zhang, Z., Chen, X., Wang, C., Wang, R., & Song, W. (2024). A Structured Bipartite Graph Learning Method for Ensemble Clustering. A Structured Bipartite Graph Learning Method for Ensemble Clustering.

Zheng, X., Lu, Y., Wang, R., Nie, F., & Li, X. (2025). Structured Graph-Based Ensemble Clustering. IEEE Transactions on Knowledge and Data Engineering.

Zhou, Z. F., Huang, D., & Wang, C. D. (2025). Pyramid contrastive learning for clustering. Neural Networks, 185, 107217.