Abstrakt
Intelligent music recommendation and retrieval systems need genre categorization, yet class imbalance, overlapping genre features, decentralized data privacy, and restricted deployment efficiency continue to challenge present methods. AAI-HarmoCNN-AttnNet, a privacyconscious federated deep learning architecture for accurate and scalable music genre categorization, addresses these concerns. The proposed model captures fine-grained spectrum cues and long-range temporal relationships using harmonic-sensitive convolutional layers and dual-path attention. Federated learning allows dispersed clients to optimize while preserving raw audio data. A hybrid hyperparameter optimization technique combining Egret Swarm Optimization and Golden Jackal Optimization improves convergence stability and generalization. AAIHarmoCNN-AttnNet outperforms thirteen competitive baselines, including CRNN, Bi-GRU with attention, and recent self-supervised methods, with 99.1% classification accuracy, 98.9% precision, 98.8% recall, and 97.4% Genre Diversity Sensitivity (GDS) score, in extensive NCASI benchmark dataset experiments. Federated evaluations show robust convergence under non-IID client distributions, with tightly clustered clientwise accuracy above 99% and decreased inter-client variation. Ablation experiments confirm the complimentary contributions of harmonic convolution and dual-path attention, while ROC analysis shows excellent discrimination with high true-positive rates and low false-positive rates. For real-time deployment, edge device resource profiling has low inference latency, small model size, and balanced power efficiency. This shows that AAI-HarmoCNN-AttnNet is a strong, privacy-preserving, and deployment-ready solution for federated music genre classification in current intelligent audio systems.
Przejdź do artykułu