3D-design exploration of CNN algorithms
Multi-dimensional algorithms are hard to implement on classical platforms. Pipelining may exploit instruction-level parallelism, but not in the presence of simultaneous data; threads optimize only within the given restrictions. Tiled architectures do add a dimension to the solution space. With locally a large register store, data parallelism is handled, but only to a dimension. 3-D technologies ar