机构地区: 东莞理工学院
出 处: 《现代制造工程》 2007年第9期17-21,61,共6页
摘 要: 平行机调度问题在工业界有着广泛应用,实际生产中瓶颈工序的调度很多属于这类问题。运用增强学习算法来研究以最小化作业的加权平均滞留时间为目标的动态平行机调度问题Qm|rj,sjk,Mj|∑wjfj,考虑与作业顺序相关的转换时间和机器-作业资格约束。为了把调度问题转化为增强学习问题,定义了系统状态的表示方式,利用加权最短加工时间优先(WSPT)规则、Weng算法、排名(RA)算法和LFJ-RA(Least Flexible Job-Ranking Algorithm)算法构造行为,定义了与调度目标函数等价的报酬函数,并采用结合函数泛化器的Q学习算法来解决。实验表明Q学习算法对每个测试问题的调度结果都优于WSPT规则、排名算法、LFJ-RA算法和Weng算法。 Parallel machine scheduling problem is common in industry. A Reinforcement Learning(RL) algorithm, Q-learning ,was used to solve unrelated parallel machine scheduling probler Qm|rj,sjk,Mj|∑wjfj,. The sequence-dependent conversion times and machine eligibility constraint were considered. To convert the scheduling problem into an RL problem, the problem was formulated as Semi-Markov Decision Process by defining system state, actions and the reward function. Four heuristics, WSPT, Weng's Algorithm, Ranking Algorithm (RA)and LFJ-RA, were defined as actions. Q-Learning combining linear gradient-descent function approximation was used to minimize the mean weighted flow time. Q-Learning learned to select optimal or sub-optimal actions at different states through simulation. Experiment results show that Q-Learning is superior to the four heuristics in all test problems.