IDENTIFYING KEY FEATURES IN FMCG SUPPLY CHAIN NODE CLASSIFICATION USING RANDOM FOREST AND XGBOOST
Abstract
The efficient management of supply chain nodes is critical in the fast-moving consumer goods (FMCG) sector, where misclassification of products and resources can lead to inefficiencies and increased costs. This research investigates the classification of FMCG supply chain nodes by product group and identifies the key features influencing this classification through machine learning. A dataset of 40 nodes, enriched with 33 engineered attributes from product metadata, plant and storage codes, and node identifiers, was analyzed. Random Forest and XGBoost were employed for node classification, with performance evaluated using cross-validation and confusion matrices. Both models achieved perfect accuracy, precision, recall, and F1-scores, demonstrating the predictive adequacy of the selected features. Feature importance analysis revealed that Subgroup Encoded was the strongest predictor, alongside location-specific variables (e.g., Plant_2114) and node-level attributes (Has_AT, Has_MA). These findings underscore the value of feature importance analysis in uncovering hidden dependencies and enhancing explainability in supply chain operations. The study provides actionable insights for warehouse planning, product placement, and resource allocation, while also highlighting the novelty of applying explainable AI in FMCG supply chains. Future research should extend this work to larger datasets and temporal features to ensure scalability and robustness.