QuickVec C++

A modern C++ approach to explicit SIMD vectorization.


Unsplashed background img 2

Summary

QuickVec C++ will be a modern C++ approach to explicit cross-platform SIMD vectorization. It will enable developers access to the power of the hardware SIMD feature sets using a common set of features.

Background

Speed

Efficiency and performance are important to many high performance applications. QuickVec C++ will be implemented with a focus on performance and efficiency. It will expose the power of the SIMD instructions available on the hardware on each system without forcing the user to duplicate any work.

Familiar Development

It is important to developers to work with well designed systems that follow the paradigms that they know and understand. The QuickVec C++ library will attempt to match the patterns presented by the C++ standard library and STL. This will ease the development process greatly.

Cross-Platform

When developing applications for multiple platforms, it not only inconvenient for developers to use multiple SIMD intrinsic sets, but also inefficient and time-consuming. The QuickVec C++ library will maintain its performance while being fully cross-platform. This means that functions written using the library should run on any of the supported platforms with no extra effort expended.

Unsplashed background img 3

Challenge

The major challenge of this project is bringing all of the targets together in one library. The target instruction sets include SSE (versions 1-5), AVX (1,2, and 512-bit), and ARM Neon. The challenge for this project is that there are many differing sets of intrinsics for different instruction feature sets and platforms. A large part of the challenge will be determining which features are available for a processor at runtime and executing with little to no overhead in comparison to a sequential implementation if the SIMD operations are not available. The second challenge is getting results with little to no difference in speed with a similarly written SIMD intrinsics implementation.

Resources

I will be starting from a blank code base. For references I will be using the intrinsic documents and reference guides for SSE and AVX intrinsics found here, and ARM Neon intrinsics found here

Goals

  • Find and Implement Commonalities

    I plan to look through the different intrinsic functions from the different sets and find commonalities. From there I plan to bundle as much as possible together using templates and template-specialization. This will allow the user to specify arguments such as the type (float or int), precision, and number of elements in a vector, and the template instantiate that from whatever is available from the compiled feature set.

  • Match performance.

    This is in reference to firstly the sequential version of algorithms. By this I mean that when using sequential implementations due to SIMD features not being available the library will introduce only minimal overhead. Secondly, this is in reference to other libraries similar to this one, such as Yeppp!, which is a C library for cross-platform SIMD utilization. Lastly, this is relative to using the platforms intrinsics directly. This means that overhead for using SIMD via this library will be minimal.

  • Use modern C++ Style

    By this I mean that I will follow the interface style of the STL libraries and Boost. This includes making sure that move and value semantics are maintained for all types. This includes making vectors capable of range for loops.

Unsplashed background img 2

Why use cross-platform C++?

C++ is the current standard language for high-performance cross-platform application development. It is also capable of achieving different levels of abstraction that are not available in other languages. Also, base intrinsics are available onall of the target platforms in C or C++ as a base for implementations.

Schedule

Week Task
April 3 - April 10 Find Commonalities and Design Interfaces
April 10 - April 17 Implement sequential and SSE4
April 17 - April 24 Implement other SSE versions, AVX, and Neon
April 24 - May 1 Implement automatic detection of extensions.
May 1 - May 8 Add iterators and other helpers.
May 8 - May 11 Prepare for final presentation.