3 | 0 | 7 |
下载次数 | 被引频次 | 阅读次数 |
带时间窗约束的车辆路径优化问题在物流领域具有重要地位。为更好地平衡此类问题优化算法的计算时间和求解精度,提出一种基于深度强化学习的邻域搜索算法,以解决耗费大量人工实验来寻找最佳搜索规则的问题。首先,将邻域搜索建模为一个马尔可夫决策过程,并设计能够挖掘时空信息的编码-解码模型,模拟智能体执行搜索动作的随机策略。再利用近端策略优化算法训练编码-解码模型,提高训练效率,使训练过程更加稳定。本文设计的动态图注意力网络编码器包括位置编码模块和注意力模块,可获取客户节点在配送路网中的空间信息。而门控循环单元解码器从时间维度分析智能体执行破坏操作的历史经验和当前时刻图结构信息,输出当前可行解序列中需要被删除的节点集合。丰富的实验结果验证了所提深度强化学习方法的有效性,在不同问题规模的测试集、Solomon标准数据集及京东物流实际配送任务中均表现出竞争优势。
Abstract:The vehicle routing problem with time windows plays an important role in the fields of logistics. For a better balance of the computation time and solution accuracy of optimization algorithms for this problem, a neighborhood search algorithm based on deep reinforcement learning was proposed, which aimed at avoiding the drawback of extensive manual experimentation requirement in optimization of the search rules. Firstly, the neighborhood search was modeled as a Markov decision process, and an encoder-decoder model capable of mining spatiotemporal information was designed to simulate the stochastic policy of an agent performing search actions. Secondly, the proximal policy optimization algorithm was utilized to train the encoder-decoder model, which can enhance the training efficiency and stabilize the training process. The positional encoding module and the attention module in the dynamic graph attention network encoder were used to acquire the position information of customer nodes in the road network. The gated recurrent unit decoder analyzed the historical experience of the agent performing destruction operations and the current graph structure information, and output the nodes set needed to be removed from the current solution. Extensive experimental results validate the effectiveness of the proposed deep reinforcement learning based method, and its competitive advantages are verified via test sets of varying problem scales, the Solomon benchmark dataset, and real-world delivery tasks from JD Logistics.
[1]EL-SHERBENY N A. Vehicle routing with time windows:an overview of exact, heuristic and metaheuristic methods[J]. Journal of King Saud University-Science, 2010, 22(3):123-131.
[2]MORRISON D R, JACOBSON S H, SAUPPE J J,et al. Branch-and-bound algorithms:a survey of recent advances in searching, branching, and pruning[J].Discrete Optimization, 2016, 19:79-102.
[3]TOMAZELLA C P, NAGANO M S. A comprehensive review of branch-and-bound algorithms:guidelines and directions for further research on the flowshop scheduling problem[J]. Expert Systems with Applications, 2020, 158:113556.
[4]CASTELLUCCI P B, COELHO L C, DARVISH M. A new branch-and-Benders-cut algorithm for the time-dependent vehicle routing problem[J]. Expert Systems with Applications, 2025, 265:125996.
[5]POWELL W B. Perspectives of approximate dynamic programming[J]. Annals of Operations Research,2016, 241(1):319-356.
[6]SHI Y, BOUDOUH T, GRUNDER O. A hybrid genetic algorithm for a home health care routing problem with time window and fuzzy demand[J]. Expert Systems with Applications, 2017, 72:160-176.
[7]高志波,龙科军,王倩,等.车辆路线问题的自适应遗传模拟退火算法[J].中国科技论文,2017, 12(7):764-769.GAO Z B, LONG K J, WANG Q, et al. A self-adaptive genetically simulated annealing algorithm of vehicle routing problem[J]. China Sciencepaper, 2017, 12(7):764-769.(in Chinese)
[8]MARINAKIS Y, MARINAKI M, MIGDALAS A.A multi-adaptive particle swarm optimization for the vehicle routing problem with time windows[J]. Information Sciences, 2019, 481:311-329.
[9]JúNIOR O S S, LEAL J E. A multiple ant colony system with random variable neighbourhood descent for the vehicle routing problem with time windows[J]. International Journal of Logistics Systems and Management,2021, 40(1):52.
[10]GMIRA M, GENDREAU M, LODI A, et al. Tabu search for the time-dependent vehicle routing problem with time windows on a road network[J]. European Journal of Operational Research, 2021, 288(1):129-140.
[11]DUMEZ D, LEHUéDéF, PéTON O. A large neighborhood search approach to the vehicle routing problem with delivery options[J]. Transportation Research Part B:Methodological, 2021, 144:103-132.
[12]HOOGEBOOM M, DULLAERT W, LAI D, et al.Efficient neighborhood evaluations for the vehicle routing problem with multiple time windows[J]. Transportation Science, 2020, 54(2):400-416.
[13]王祺,肖青.模糊需求下的多中心冷链配送车辆路径问题[J].计算机工程与应用,2023, 59(23):341-350.WANG Q, XIAO Q. Multi-depot cold chain distribution vehicle route problem under fuzzy demand[J].Computer Engineering and Applications, 2023, 59(23):341-350.(in Chinese)
[14]CROITORU F A, HONDRU V, IONESCU R T,et al. Diffusion models in vision:a survey[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2023, 45(9):10850-10869.
[15]ALSHEMALI B, KALITA J. Improving the reliability of deep neural networks in NLP:a review[J].Knowledge-Based Systems, 2020, 191:105210.
[16]LADOSZ P, WENG L L, KIM M, et al. Exploration in deep reinforcement learning:a survey[J]. Information Fusion, 2022, 85:1-22.
[17]BENGIO Y, LODI A, PROUVOST A. Machine learning for combinatorial optimization:a methodological tour d’horizon[J]. European Journal of Operational Research, 2021, 290(2):405-421.
[18]LIU S C, ZHANG Y, TANG K, et al. How good is neural combinatorial optimization? A systematic evaluation on the traveling salesman problem[J]. IEEE Computational Intelligence Magazine, 2023, 18(3):14-28.
[19]MAZYAVKINA N, SVIRIDOV S, IVANOV S,et al. Reinforcement learning for combinatorial optimization:a survey[J]. Computers&Operations Research, 2021, 134:105400.
[20]李凯文,张涛,王锐,等.基于深度强化学习的组合优化研究进展[J].自动化学报,2021, 47(11):2521-2537.LI K W, ZHANG T, WANG R, et al. Research reviews of combinatorial optimization methods based on deep reinforcement learning[J]. Acta Automatica Sinica, 2021, 47(11):2521-2537.(in Chinese)
[21]VINYALS O, FORTUNATO M, JAITLY N.Pointer networks[EB/OL].[2025-02-20]. https://arxiv. org/abs/1506. 03134.
[22]BELLO I, PHAM H, LE Q, et al. Neural combinatorial optimization with reinforcement learning[EB/OL].[2025-02-20]. https://arxiv. org/abs/1611.09940.
[23]WILLIAMS R J. Simple statistical gradient-following algorithms for connectionist reinforcement learning[J].Machine Learning, 1992, 8:229-256.
[24]NAZARI M, OROOJLOOY A, TAKá?M, et al.Reinforcement learning for solving the vehicle routing problem[C]//Proceedings of the 32nd International Conference on Neural Information Processing Systems.New York:ACM, 2018:9861-9871.
[25]KOOL W, HOOF H V, WELLING M. Attention,learn to solve routing problems![EB/OL].[2025-02-20]. https://arxiv. org/abs/1803. 08475.
[26]孔繁辉,姜斌.基于深度强化学习的多无人机协同配送路径组合优化研究[J].科技管理研究,2025, 45(7):194-206.KONG F H, JIANG B. Research on optimization of multi-UAV collaborative delivery route combination based on deep reinforcement learning[J]. Science and Technology Management Research, 2025, 45(7):194-206.(in Chinese)
[27]LI J W, MA Y N, CAO Z G, et al. Learning feature embedding refiner for solving vehicle routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2024, 35(11):15279-15291.
[28]LUO J, LI C F. An efficient encoder-decoder network for the capacitated vehicle routing problem[J]. Expert Systems with Applications, 2025, 278:127311.
[29]唐开强,傅汇乔,刘佳生,等.基于深度强化学习的带约束车辆路径分层优化研究[J].系统工程与电子技术,2025, 47(3):827-841.TANG K Q, FU H Q, LIU J S, et al. Hierarchical optimization research of constrained vehicle routing based on deep reinforcement learning[J]. Systems Engineering and Electronics, 2025, 47(3):827-841.(in Chinese)
[30]TANG M C, ZHUANG W C, LI B B, et al. Energyoptimal routing for electric vehicles using deep reinforcement learning with transformer[J]. Applied Energy,2023, 350:121711.
[31]张文会,郑文诏,向宇豪,等.基于深度强化学习的低碳运输路径优化模型[J].工业工程与管理,2025,30(2):169-177.ZHANG W H, ZHENG W Z, XIANG Y H, et al. A model for green vehicle routing optimization problem based on deep reinforcement learning[J]. Industrial Engineering and Management, 2025, 30(2):169-177.(in Chinese)
[32]罗佳,李朝锋.基于残差图卷积网络与深度强化学习的需求可拆分车辆路径优化算法[J].控制理论与应用,2024, 41(6):1123-1136.LUO J, LI C F. The split delivery vehicle routing optimization with the residual graph convolutional network and deep reinforcement learning[J]. Control Theory&Applications, 2024, 41(6):1123-1136.(in Chinese)
[33]CHEN X Y, TIAN Y D. Learning to perform local rewriting for combinatorial optimization[C]//Proceedings of the 33rd International Conference on Neural Information Processing Systems. Red Hook, NY:Curran Associates Inc., 2019:6281-6292.
[34]LU H, ZHANG X W, YANG S. A learning-based iterative method for solving vehicle routing problems[C]//International Conference on Learning Representations. San Diego, CA:OpenReview. net, 2020.
[35]JOHNN S N, DARVARIU V A, HANDL J, et al. A graph reinforcement learning framework for neural adaptive large neighbourhood search[J]. Computers&Operations Research, 2024, 172:106791.
[36]WU Y X, SONG W, CAO Z G, et al. Learning improvement heuristics for solving routing problems[J]. IEEE Transactions on Neural Networks and Learning Systems, 2021, 33(9):5057-5069.
[37]YE H R, WANG J R, CAO Z G, et al. DeepACO:neural-enhanced ant systems for combinatorial optimization[C]//Proceedings of 37th Conference on Neural Information Processing Systems. New York:Curran Associates Inc., 2023:43706-43728.
[38]WANG Z L, HU X X, SUN H Q, et al. ADNS:an adaptive dynamic neighborhood search method guided by joint learning heuristics and corresponding hyperparameters[J]. Applied Soft Computing, 2025, 180:113280.
[39]HELSGAUN K. An extension of the Lin-KernighanHelsgaun TSP solver for constrained traveling salesman and vehicle routing problems:technical report[R].Roskilde:Roskilde Universitet, 2017.
[40]XIN L, SONG W, CAO Z G, et al. NeuroLKH:Combining deep learning model with Lin-KernighanHelsgaun heuristic for solving the traveling salesman problem[C]//Proceedings of 35th Conference on Neural Information Processing Systems. New York, NY:Curran Associates Inc., 2021:7472-7483.
[41]HELSGAUN K. An effective implementation of the Lin-Kernighan traveling salesman heuristic[J]. European Journal of Operational Research, 2000, 126(1):106-130.
[42]SCHRIMPF G, SCHNEIDER J, STAMM-WILBRANDT H, et al. Record breaking optimization results using the ruin and recreate principle[J]. Journal of Computational Physics, 2000, 159(2):139-171.
[43]CHRISTIAENS J, VANDEN BERGHE G. Slack induction by string removals for vehicle routing problems[J]. Transportation Science, 2020, 54(2):417-433.
[44]SCHULMAN J, WOLSKI F, DHARIWAL P, et al.Proximal policy optimization algorithms[EB/OL].[2025-02-20]. https://arxiv. org/abs/1707. 06347.
[45]SOLOMON M M. Algorithms for the vehicle routing and scheduling problems with time window constraints[J]. Operations Research, 1987, 35(2):254-265.
[46]TAILLARDé,BADEAU P, GENDREAU M, et al. A tabu search heuristic for the vehicle routing problem with soft time windows[J]. Transportation Science, 1997, 31(2):170-186.
[47]LI H B, LIM A. Local search with annealing-like restarts to solve the VRPTW[J]. European Journal of Operational Research, 2003, 150(1):115-127.
[48]BARBUCHA D. A cooperative population learning algorithm for vehicle routing problem with time windows[J]. Neurocomputing, 2014, 146:210-229.
基本信息:
DOI:
中图分类号:U492.22;TP18
引用信息:
[1]罗佳,贺尊昊,傅海威等.DGA-GRU:配送时间限制下车辆路径优化的邻域搜索学习算法[J].中国科技论文,2025,20(08):684-698.
基金信息:
浙江省哲社规划课题资助项目(25NDJC136YB)