机构地区: 清华大学机械工程学院工业工程系
出 处: 《计算机集成制造系统》 2007年第1期110-116,共7页
摘 要: 尝试运用增强学习方法来研究平行机调度问题,通过定义系统状态、行为和报酬函数,把调度问题转化为平均报酬型半马尔可夫决策过程,并使用结合函数泛化器的R-Learning算法来解决。提出排名算法,并利用它和两种常用的调度规则(最短期望加工时间规则和先进先出规则)来定义增强学习的行为。实验结果表明,R-Learning算法通过仿真实验学习较优的调度策略,在不同的决策状态下选择最优或次优的行为,对每个测试问题的效果都优于以上任何一条调度规则。 A Reinforcement Learning (RL) method, R-Learning, was used to study parallel machines scheduling problems which was aimed to minimize mean flow time of jobs. The scheduling problem was converted into Semi-Markov Decision Process(SMDP) by defining system state, actions and reward function. It was solved by R- Learning functions. A heuristic, Ranking Algorithm (RA) was proposed and defined as RL as well as two commonly used dispatching rules: Shortest Expected Processing Time (SEPT) and First In First Out (FIFO). Experiment results demonstrated that R-Learning could learn a near-optimal scheduling policy through simulation, i.e. to select optimal or sub-optimal actions at different states. The conclusion was that R-Learning was superior to the above heuristic rules in all test problems.