《Go 语言原本》

# 3.4 工作窃取调度理论

TODO: 暂无此节内容写作计划，请谨慎阅读

TODO: 讨论与 work tealing 相关的调度理论，可能的讨论的主题包括：

1. original work stealing scheduling
2. NUMA-aware scheduling
3. non-cooperative preemption
4. work stealing with latency
6. nested parallel scheduling
7. power-aware scheduling
8. distributed work stealing

# 基于工作窃取的多线程计算调度

Robert D. Blumofe, Charles E. Leiserson

Journal of the ACM, Vol. 46, No.5, Spectember 1999, pp. 720-748

TODO:

TODO:

## 4 随机化工作窃取算法

TODO:

1. 对于 $i = 1,2, …, k$，线程 $\Gamma_i$ 为 $\Gamma_{i-1}$ 的父线程。
2. 如果有 $k>1$，则对于 $i=1,2,…,k-1$，线程 $\Gamma_{i}$ 没有执行工作因为它尚未被 $\Gamma_{i-1}$ 创建。

TODO:

TODO:

## 7 结论

Cilk 系统目前运行在现代共享内存多处理器上，例如 Sun Enterprise，Silicon Graphics Origin，Intel Quad Pentium 和 DEC Alphaserver（早期版本的 Cilk 运行在 Thinking Machines CM-5 MPP，Intel Paragon MPP 和 IBM SP-2 上）。迄今为止，用 Cilk 编写的应用程序包括蛋白质折叠 [Pande et al. 1994]，图形渲染 [Stark 1998]，回溯搜索和 *􏱑Socrates 国际象棋程序 [Joerg and Kuszmaul, 1994]，它在 1995 年 ICCA 世界计算机国际象棋锦标赛中获得二等奖，该锦标赛在桑迪亚国家实验室的 1824-node Paragon上运行。我们最近的国际象棋程序 Cilkchess 赢得了 1996 年荷兰公开赛计算机国际象棋锦标赛。 Cilk 的团队编程在国际功能编程大会主办的 ICFP'98 编程竞赛中获得了一等奖（不败）。

## 参考文献

• ARORA, N. S., BLUMOFE, R. D., AND PLAXTON, C. G. 1998. Thread scheduling for multiprogrammed multiprocessors. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98) (Puerto Vallarta, Mexico, June 28–July 2). ACM, New York, pp. 119–129.
• ARVIND, NIKHIL, R. S., AND PINGALI, K. K. 1989. I-structures: Data structures for parallel computing. ACM Trans. Program. Lang. Syst. 11, 4 (Oct.), 598–632.
• BLELLOCH, G. E., GIBBONS, P. B., AND MATIAS, Y. 1995. Provably efficient scheduling for languages with fine-grained parallelism. In Proceedings of the 7th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’95) (Santa Barbara, Calif., July 17–19). ACM, New York, pp. 1–12.
• BLELLOCH, G. E., GIBBONS, P. B., MATIAS, Y., AND NARLIKAR, G. J. 1997. Space-efficient scheduling of parallelism with synchronization variables. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’97) (Newport, R.I., June 22–25). ACM, New York, pp. 12–23.
• BLUMOFE, R. D. 1995. Executing multithreaded programs efficiently. Ph.D. thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology. Also avail- able as MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-677.
• BLUMOFE, R. D., FRIGO, M., JOERG, C. F., LEISERSON, C. E., AND RANDALL, K. H. 1996a. An analysis of dag-consistent distributed shared-memory algorithms. In Proceedings of the 8th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’96) (Padua, Italy, June 24–26). ACM, New York, pp. 297–308.
• BLUMOFE, R. D., FRIGO, M., JOERG, C. F., LEISERSON, C. E., AND RANDALL, K. H. 1996b. Dag-consistent distributed shared memory. In Proceedings of the 10th International Parallel Process- ing Symposium (IPPS) (Honolulu, Hawaii, April). IEEE Computer Society Press, Los Alamitos, Calif., pp. 132–141.
• BLUMOFE, R. D., JOERG, C. F., KUSZMAUL, B. C., LEISERSON, C. E., RANDALL, K. H., AND ZHOU, Y. 1996c. Cilk: An efficient multithreaded runtime system. J. Paral. Dist. Comput. 37, 1 (Aug.), 55– 69.
• BLUMOFE, R. D., AND LEISERSON, C. E. 1998. Space-efficient scheduling of multithreaded computations. SIAM J. Comput. 27, 1 (Feb.), 202–229.
• BLUMOFE, R. D., AND LISIECKI, P. A. 1997. Adaptive and reliable parallel computing on networks of workstations. In Proceedings of the USENIX 1997 Annual Technical Conference on UNIX and Advanced Computing Systems (Anaheim, Calif., Jan.). USENIX Associates, Berkeley, Calif., pp. 133–147.
• BLUMOFE, R. D., AND PAPADOPOULOS, D. 1998. The performance of work stealing in multiprogrammed environments. Tech. Rep. TR-98-13 (May). Dept. Computer Sciences, The University of Texas at Austin, Austin, Tex.
• BRENT, R. P. 1974. The parallel evaluation of general arithmetic expressions. J. ACM 21, 2 (Apr.), 201–206.
• BURTON, F. W. 1988. Storage management in virtual tree machines. IEEE Trans. Comput. 37, 3 (Mar.), 321–328.
• BURTON, F. W. 1996. Guaranteeing good memory bounds for parallel programs. IEEE Trans. Softw. Eng. 22, 10 (Oct.), 762–773.
• BURTON, F. W., AND SLEEP, M. R. 1981. Executing functional programs on a virtual tree of processors. In Proceedings of the 1981 Conference on Functional Programming Languages and Computer Architecture (Portsmouth, N.H., Oct.). ACM, New York, N.Y., pp. 187–194.
• CHENG, G.-I. 1998. Algorithms for data-race detection in multithreaded programs. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technol- ogy.
• CHENG, G.-I., FENG, M., LEISERSON, C. E., RANDALL, K. H., AND STARK, A. F. 1998. Detecting data races in Cilk programs that use locks. In Proceedings of the 10th ACM Symposium on Parallel Algorithms and Architectures (SPAA’98) (Puerto Vallarta, Mexico, June 28–July 2). ACM, New York, pp. 298–309.
• CULLER, D. E., AND ARVIND. 1988. Resource requirements of dataflow programs. In Proceedings of the 15th Annual International Symposium on Computer Architecture (ISCA) (Honolulu, Hawaii, May). IEEE Computer Society Press, Los Alamitos, Calif., pp. 141–150. Also available as MIT Laboratory for Computer Science, Computation Structures Group Memo 280.
• EAGER, D. L., ZAHORJAN, J., AND LAZOWSKA, E. D. 1989. Speedup versus efficiency in parallel systems. IEEE Trans. Comput. 38, 3 (Mar.), 408–423.
• FELDMANN, R., MYSLIWIETZ, P., AND MONIEN, B. 1993. Game tree search on a massively parallel system. Adv. Comput. Chess 7, 203–219.
• FENG, M., AND LEISERSON, C. E. 1997. Efficient detection of determinacy races in Cilk programs. In Proceedings of the 9th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’97) (Newport, R.I., June 22–25). ACM, New York, pp. 1–11.
• FINKEL, R., AND MANBER, U. 1987. DIB—A distributed implementation of backtracking. ACM Trans. Program. Lang. Syst. 9, 2 (Apr.), 235–256.
• FRIGO, M. 1998. The weakest reasonable memory model. Master’s thesis, Dept. Electrical Engi- neering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
• FRIGO, M., LEISERSON, C. E., AND RANDALL, K. H. 1998. The implementation of the Cilk-5 multithreaded language. In Proceedings of the 1998 ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI’98) (Montreal, Canada, June 17–19). ACM, New York.
• FRIGO, M., AND LUCHANGCO, V. 1998. Computation-centric memory models. In Proceedings of the 10th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’98) (Puerto Vallarta, Mexico, June 28–July 2). ACM, New York, pp. 240–249.
• GRAHAM, R. L. 1966. Bounds for certain multiprocessing anomalies. Bell Syst. Tech. J. 45, 1563–1581.
• GRAHAM, R. L. 1969. Bounds on multiprocessing timing anomalies. SIAM J. Appl. Math. 17, 2 (Mar.), 416–429.
• HALBHERR, M., ZHOU, Y., AND JOERG, C. F. 1994. MIMD-style parallel programming with continuation-passing threads. In Proceedings of the 2nd International Workshop on Massive Parallel- ism: Hardware, Software, and Applications (Capri, Italy, Sept.). World Scientific, Singapore. (Also available as MIT Laboratory for Computer Science Computation Structures, Group Memo 355, March 1994. MIT, Cambridge, Mass.
• HALSTEAD, R. H., JR. 1984. Implementation of Multilisp: Lisp on a multiprocessor. In Conference Record of the 1984 ACM Symposium on LISP and Functional Programming (Austin, Tex., Aug.) ACM, New York, pp. 9–17.
• JOERG, C. F. 1996. The Cilk System for Parallel Multithreaded Computing. Ph.D. dissertation. Dept. Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
• JOERG, C., AND KUSZMAUL, B. C. 1994. Massively parallel chess. In Proceedings of the 3rd DIMACS Parallel Implementation Challenge (Rutgers University, New Jersey, Oct. 1994).
• KARP, R. M., AND ZHANG, Y. 1993. Randomized parallel algorithms for backtrack search and branch-and-bound computation. J. ACM 40, 3 (July), 765–789.
• KUSZMAUL, B. C. 1994. Synchronized MIMD computing. Ph.D. thesis, Dept. Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass. Also available as MIT Laboratory for Computer Science Technical Report MIT/LCS/TR-645.
• LISIECKI, P. 1996. Macroscheduling in the Cilk network of workstations environment. Master’s thesis, Dept. Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
• LIU, P., AIELLO, W., AND BHATT, S. 1993. An atomic model for message-passing. In Proceedings of the 5th Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’93) (Velen, Germany, June 30–July 2). ACM, New York, pp. 154–163.
• MOHR, E., KRANZ, D. A., AND HALSTEAD, R. H., JR. 1991. Lazy task creation: A technique for increasing the granularity of parallel programs. IEEE Trans. Parall. Dist. Syst. 2, 3 (July), 264–280. PANDE, V. S., JOERG, C. F., GROSBERG, A. Y., AND TANAKA, T. 1994. Enumerations of the
• Hamiltonian walks on a cubic sublattice. J. Phys. A 27. PAPADOPOULOS, D. P. 1998. Hood: A user-level thread library for multiprogramming multiprocessors. Master’s thesis, Dept. Computer Sciences, The University of Texas at Austin, Austin, Tex. RANADE, A. 1987. How to emulate shared memory. In Proceedings of the 28th Annual Symposium on Foundations of Computer Science (FOCS) (Los Angeles, Calif., Oct.). IEEE Computer Society
• Press, Los Alamitos, Calif., pp. 185–194. RANDALL, K. H. 1998. Cilk: Efficient multithreaded computing. Ph.D. dissertation. Dept. Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass. RUDOLPH, L., SLIVKIN-ALLALOUF, M., AND UPFAL, E. 1991. A simple load balancing scheme for task allocation in parallel machines. In Proceedings of the 3rd Annual ACM Symposium on Parallel Algorithms and Architectures (SPAA’91) (Hilton Head, S.C., July 21–24). ACM, New York, pp. 237–245.
• RUGGIERO, C. A., AND SARGEANT, J. 1987. Control of parallelism in the Manchester dataflow machine. In Functional Programming Languages and Computer Architecture, Number 274 in Lecture Notes in Computer Science. Springer-Verlag, New York, pp. 1–15.
• STARK, A. F. 1998. Debugging multithreaded programs that incorporate user-level locking. Master’s thesis, Department of Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Mass.
• VANDEVOORDE, M. T., AND ROBERTS, E. S. 1988. WorkCrews: An abstraction for controlling parallelism. International Journal of Parallel Programming 17, 4 (Aug.), 347–366.
• WU, I.-C., AND KUNG, H. T. 1991. Communication complexity for parallel divide-and-conquer. In Proceedings of the 32nd Annual Symposium on Foundations of Computer Science (FOCS) (San Juan, Puerto Rico, Oct. 1991). IEEE Computer Society Press, Los Alamitos, Calif., pp. 151–162.
• ZHANG, Y., AND ORTYNSKI, A. 1994. The efficiency of randomized parallel backtrack search. In Proceedings of the 6th IEEE Symposium on Parallel and Distributed Processing (Dallas, Texas, Oct. 1994). IEEE Computer Society Press, Los Alamitos, Calif.