Since submitting the project to the network, I've updated the library from version 1.0 to 1.2, which includes the following improvements:
- I simplified the API by removing the `upsideDown` boolean parameter. Now, to flip the image, you just pass in a negative pitch.
- You can now implement custom memory and file output by defining some macros.
- The library got ~40% faster on my test corpus, bringing it to around 500 MB/s average, 300 MB/s worst-case on my laptop.
- The maximum bit depth was increased from 15 to 16.
- The code now compiles (in C++ mode) using MSVC 2010, and should be more compatible with older compilers in general.
- Many small fixes and improvements to documentation.
- The code got shorter and simpler, which I consider to be a valuable feature of the library.

With all these changes implemented, I consider the project to be functionally complete. I have no further features specifically planned, aside from probably an ARM NEON version of the SIMD code at some point.