Optimization of Big Data Parallel Scheduling Based on Dynamic Clustering Scheduling Algorithm

Fang Liu, Wuhan University
Yanxiang He, Wuhan University
Jing He, Kennesaw State University
Xing Gao, Wuhan City College


In today’s data age, the big data processing analysis framework plays an important role in mass information processing, along with the increasing of massive data. “Sharing Data” is proposed to enhance the performance of data processing through structured data scheduling. However, such approach makes the higher communication cost and buffer cost for the extra data copy and buffering. Hence, in the big data analysis environment, this paper uses based on the correlation of data, Dynamic Cluster Scheduling Algorithm(DCSA) is proposed for parallel optimization of big data tasks. Firstly, a dynamic data queue based on the server’s request database is generated. The priority of data item and size of data item are as the considerations of dynamic data queue for data clustering association. And then the weights are introduced, the dynamic data item is made equalization to provide the basis for the multi-channel optimal scheduling. Secondly, according to the relevance of the data items, the mechanism of data optimized placement is used to make the data which are aggregated in the same frame. After the placement is completed, the dynamic data is uniformly scheduled to minimize the cost at the time of migration, with the local characteristics of the data item as constraints. Through the target iteration, the optimal scheduling scheme is adjusted, and finally to achieve multi-channel optimal scheduling. Experiments show that the proposed method enables dynamic data to achieve optimal scheduling.