机构地区: 广州铁路职业技术学院
出 处: 《计算机工程与设计》 2012年第4期1352-1356,共5页
摘 要: 提出与描述了支持低延迟通信与容错的计算资源共享环境LF-CRSE(low latency and fault tolerance CRSE),LF-CRSE提出了节点功能角色的观点,由客户端功能节点、任务服务器、工作机服务提供器、工作机节点组成,形成一个可扩展的分布式网络体系结构。采用了任务缓存、任务预获取和任务服务器端计算等策略保证了通信过程的低延迟开销。在应用上利用分支界限模式的任务划分,使LF-CRSE支持主-从模式和分-治模式的灵活编程模型。通过工作机端的心跳消息和面向子任务的容错方式保证了LF-CRSE的正确性。测试过程选择了具有数据依赖的分布式旅行商问题,实验结果表明,LF-CRSE的加速比随着工作机的增加稳定提高,在低延迟通信和容错特性上也具有良好的性能。 A computing resource sharing environment with low latency and fault tolerance called LF-CRSE is presented and de- scribed. All the nodes in LF-CRSE are designed as a certain role, named client, task sever, worker service provider, worker and thus form a scalable network topology for LF-CRSE. For a parallel application, LF-CRSE can hide communication latency via task cache, task pre-fetching and task server computation policy. These features also enable an elegant expression of branch-and- bound optimization, which is used for the divide-and-conquer computations. LF-CRSE manages a worker processor set which can change during the program execution for reasons that include faulty workers. LF-CRSE is deployed as an experimental platform, with which we have achieved a computation record by solving the TSP (travelling salesman problem). The results obtained from performance analysis show that the speedup of LF-CRSE is increased. Some good performances are also obtained in the low latency and fault tolerance testing.
关 键 词: 分布式计算 计算资源共享 低延迟 容错 分支界限
领 域: [自动化与计算机技术] [自动化与计算机技术]