|
Ensemble clustering leverages multiple methods to identify diverse patterns and, instead of depending on a singular approach, generates a more dependable and accurate clustering solution. This methodology mitigates bias and noise in intricate, high-dimensional data, allowing the grouping of biological and genomic big data. Component-based ensemble clustering divides data into subsets, applies several algorithms, and then aggregates the outcomes to increase performance. This method analyzes each data subset independently, facilitating the recognition of various patterns while minimizing noise and bias. This paper proposes two novel clustering methods that integrate multiple algorithms, including Agglomerative Hierarchical Clustering (AHC), K-Means Clustering, Hierarchical Density-Based Spatial Clustering of Applications with Noise (HDBSCAN), Ordering Points to Identify the Clustering Structure (OPTICS), Improved Density-Based Spatial Clustering of Applications with Noise (IDBSCAN), and Density-Based Spatial Clustering of Applications with Noise Plus Plus (DBSCAN++). The second method, termed Ensemble Clustering with Each Subset (ECES), employs both ‘with-replacement’ and ‘without-replacement’ techniques to increase variety, minimize redundancy, and improve generalization. The key distinction resides in the ensemble step of the second strategy, which divides datasets into equal subsets to ensure fairness and comparability. This ensures fairness, comparability, and controlled diversity within the ensemble, reducing bias, redundancy, and overlap.