GPU Acceleration For Your C# App With ILGPU Tutorial

About

In this code snippet, we’ll see how to use GPU acceleration in C# using the ILGPU library.

ILGPU provides you with a fairly simple interface to run code on your GPU from C#. For more information, you can check out the official documentation here. In this post, I will show you how to do some image processing by utilizing GPU acceleration if you want to see more examples(simpler or more complex ones) you can find them here.

I think this library is a very neat tool to add to your “toolbelt” in case you ever find yourself needing lots of parallel processing power. Even though I’ll probably forget the exact syntax and functions used I will at least remember that the library exists and the working principle behind it. And I can always come back to this post to refresh my memory.

Install NuGet Packages:

First, let’s add the ILGPU and System.Drawing.Common NuGet packages.

Code:

using ILGPU;
using ILGPU.Runtime;
using System.Drawing;
using System.Drawing.Imaging;
using System.Runtime.InteropServices;

namespace ILGPU_Example
{
    internal class Program
    {
        static void Main(string[] args)
        {
            Console.WriteLine("Started.");


            try
            {
                GPU_Acceleration_Example();
            }
            catch (Exception ex)
            {
                Console.WriteLine($"An error occurred: {ex.Message}");
            }


            Console.WriteLine("Finished.");
        }

        public static void GPU_Acceleration_Example()
        {
            //Initialize ILGPU.
            Context context = Context.CreateDefault();
            Accelerator accelerator = context.GetPreferredDevice(preferCPU: false).CreateAccelerator(context);

            //Set source and destination path.
            string imagePath = "C:\\Users\\DTPC\\Desktop\\example image.png";
            string newImagePath = $"C:\\Users\\DTPC\\Desktop\\example modified image {DateTime.Now.ToString("HH-mm-ss")}.png";

            //Load image.
            Bitmap image = new Bitmap(imagePath);

            //Get 2D Pixel array from Bitmap object.
            Image.Pixel[,] pixles = Image.ToPixelArray(image);

            //Create containers(mem. pointers) for the data that will be passed/received from the GPU.
            //ArrayView2D<Pixel, Stride2D.DenseX> bitmap = accelerator.Allocate2DDenseX<Pixel>(new Index2D(pixles.GetLength(0), pixles.GetLength(1)));
            //ArrayView2D<Pixel, Stride2D.DenseX> newBitmap = accelerator.Allocate2DDenseX<Pixel>(new Index2D(newPixles.GetLength(0), newPixles.GetLength(1)));
            //bitmap.CopyFromCPU(pixles);
            //newBitmap.CopyFromCPU(newPixles);

            //A bit more compact way of doing the same thing as above.            
            var bitmap = accelerator.Allocate2DDenseX<Image.Pixel>(pixles);
            var newBitmap = accelerator.Allocate2DDenseX<Image.Pixel>(pixles);

            //Load/precompile the kernel.
            var loadedKernel = accelerator.LoadAutoGroupedStreamKernel //LoadKernel();
            <
                Index2D,
                ArrayView2D<Image.Pixel, Stride2D.DenseX>,
                ArrayView2D<Image.Pixel, Stride2D.DenseX>,
                Image.PixelOperation,
                float,
                float,
                float
            >(ImageProcessingAcceleratedKernel);


            /////////////////// Set the operation and values to be applied to the pixels /////////////////// 

            Image.PixelOperation pixelOperation = Image.PixelOperation.Invert;
            //iInvert the colors.
            float rValue = 0;
            float gValue = 0;
            float bValue = 0;

            /////////////////////////////////////////////////////////////////////////////////////////////////
            ///

            //Finish compiling and tell the accelerator to start computing the kernel.
            loadedKernel(newBitmap.Extent.ToIntIndex(), bitmap, newBitmap, pixelOperation, rValue, gValue, bValue);

            //Wait for the accelerator to be finished with whatever it's doing in this case it just waits for the kernel to finish.
            accelerator.Synchronize();

            //Ge the computed result.
            Image.Pixel[,] newPixles = newBitmap.GetAsArray2D();

            //Convert the 2D Pixel array back into a Bitmap object.
            image = Image.ToBitmap(newPixles);

            //Finally save the modified image.
            image.Save(newImagePath);
        }

        static void ImageProcessingAcceleratedKernel(Index2D index, ArrayView2D<Image.Pixel, Stride2D.DenseX> bitmap, ArrayView2D<Image.Pixel, Stride2D.DenseX> newBitmap, Image.PixelOperation pixelOperation, float rValue, float gValue, float bValue)
        {
            //Get current Pixel.
            Image.Pixel currentPixel = bitmap[index];

            (int newR, int newG, int newB) computedResult = Image.doPixelOperation(pixelOperation, currentPixel, rValue, gValue, bValue);

            //To keep the values within the valid range: Set them to 255 if they exceed 255 and set them to 0 if they are below 0.
            computedResult.newR = Math.Max(0, Math.Min(255, computedResult.newR));
            computedResult.newG = Math.Max(0, Math.Min(255, computedResult.newG));
            computedResult.newB = Math.Max(0, Math.Min(255, computedResult.newB));

            //newBitmap[index.X, index.Y] = new Pixel(newR, newG, newB);
            //same as 
            newBitmap[index] = new Image.Pixel(computedResult.newR, computedResult.newG, computedResult.newB);
        }
    }

    public static class Image
    {
        #region Models ///////////////////////////

        //In the struct example https://github.com/EECSB/ILGPU/tree/master/Samples/SimpleStructures the struct is marked as internal.
        //To make internal work you have to add the following file with its contents: https://github.com/EECSB/ILGPU/blob/master/Samples/SimpleStructures/AssemblyAttributes.cs
        //As I put the struct into the Image class I had to make it public. Because it's now public I don't need the add the above mentioned attribute.
        public readonly struct Pixel
        {
            public Pixel(Pixel pixel)
            {
                R = pixel.R;
                G = pixel.G;
                B = pixel.B;
            }

            public Pixel(int r, int g, int b)
            {
                R = r;
                G = g;
                B = b;
            }

            public int R { get; }
            public int G { get; }
            public int B { get; }
        }

        public enum PixelOperation
        {
            AddOrSubtract,
            Multiply,
            Invert
        }

        #endregion ///////////////////////////////



        #region Pixel Operations /////////////////

        public static (int newR, int newG, int newB) doPixelOperation(PixelOperation pixelOperation, Pixel currentPixel, float rValue, float gValue, float bValue)
        {
            switch (pixelOperation)
            {
                case PixelOperation.AddOrSubtract: return AddOrSubtract(currentPixel, rValue, gValue, bValue);
                case PixelOperation.Multiply: return Multiply(currentPixel, rValue, gValue, bValue);
                case PixelOperation.Invert: return Invert(currentPixel);

                default: return (0, 0, 0);
            }
        }



        public static (int newR, int newG, int newB) AddOrSubtract(Pixel currentPixel, float addSubR, float addSubG, float addSubB)
        {
            int newR = (int)addSubR + currentPixel.R;
            int newG = (int)addSubG + currentPixel.G;
            int newB = (int)addSubB + currentPixel.B;

            return (newR, newG, newB);
        }

        public static (int newR, int newG, int newB) Invert(Pixel currentPixel)
        {
            int newR = 255 - currentPixel.R;
            int newG = 255 - currentPixel.G;
            int newB = 255 - currentPixel.B;

            return (newR, newG, newB);
        }

        public static (int newR, int newG, int newB) Multiply(Pixel currentPixel, float multiplierR, float multiplierG, float multiplierB)
        {
            int newR = (int)multiplierR * currentPixel.R;
            int newG = (int)multiplierG * currentPixel.G;
            int newB = (int)multiplierB * currentPixel.B;

            return (newR, newG, newB);
        }

        #endregion ///////////////////////////////



        #region Conversion ///////////////////////

        public static Bitmap ToBitmap(Pixel[,] imagePixels)
        {
            int width = imagePixels.GetLength(0);
            int height = imagePixels.GetLength(1);

            //Create a new Bitmap with the specified width and height.
            Bitmap bitmap = new Bitmap(width, height, PixelFormat.Format24bppRgb);

            //Lock the bits of the image to set pixel data.
            BitmapData bitmapData = bitmap.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.WriteOnly, PixelFormat.Format24bppRgb);

            try
            {
                //Calculate the stride(number of bytes allocated for a single scanline).
                int stride = bitmapData.Stride;

                //Iterate over each pixel.
                for (int y = 0; y < height; y++)
                {
                    for (int x = 0; x < width; x++)
                    {
                        //Calculate the index of the current pixel in the byte array.
                        int index = y * stride + x * 3; //3 bytes per pixel for Format24bppRgb

                        //Get the Pixel at the current position.
                        Pixel pixel = imagePixels[x, y];

                        //Set the R, G and B values directly into the memory.
                        //Learn more about the Marshal class in this post I made: https://eecs.blog/c-com-objects-interop-with-pinvoke-and-type-marshalling/#:~:text=Working%20With%20Native%20Memory
                        Marshal.WriteByte(bitmapData.Scan0, index + 2, (byte)pixel.R);
                        Marshal.WriteByte(bitmapData.Scan0, index + 1, (byte)pixel.G);
                        Marshal.WriteByte(bitmapData.Scan0, index, (byte)pixel.B);
                    }
                }
            }
            finally
            {
                //Unlock the bits of the image when done.
                bitmap.UnlockBits(bitmapData);
            }

            return bitmap;
        }

        public static Pixel[,] ToPixelArray(Bitmap image)
        {
            int width = image.Width;
            int height = image.Height;

            //Lock the bits of the image before accessing pixel data.
            BitmapData bitmapData = image.LockBits(new Rectangle(0, 0, width, height), ImageLockMode.ReadOnly, PixelFormat.Format24bppRgb);

            try
            {
                //Create a 2D Pixel array to store Pixels.
                Pixel[,] imagePixels = new Pixel[width, height];

                //Calculate the stride(number of bytes allocated for a single scanline).
                int stride = bitmapData.Stride;

                //Iterate over each pixel.
                for (int y = 0; y < height; y++)
                {
                    for (int x = 0; x < width; x++)
                    {
                        //Calculate the index of the current pixel in the byte array.
                        int index = y * stride + x * 3; //3 bytes per pixel for Format24bppRgb

                        //Get the R, G and B values directly from the memory.
                        //You can learn more about the Marshal class in this post I made: https://eecs.blog/c-com-objects-interop-with-pinvoke-and-type-marshalling/#:~:text=Working%20With%20Native%20Memory
                        byte red = Marshal.ReadByte(bitmapData.Scan0, index + 2);
                        byte green = Marshal.ReadByte(bitmapData.Scan0, index + 1);
                        byte blue = Marshal.ReadByte(bitmapData.Scan0, index);

                        //Create a new Pixel from the R,G,B values and store the Pixel array at the corresponding [x,y] position.
                        imagePixels[x, y] = new Pixel(red, green, blue);
                    }
                }

                return imagePixels;
            }
            finally
            {
                //Unlock the bits of the image when done.
                image.UnlockBits(bitmapData);
            }
        }

        #endregion ///////////////////////////////
    }
}

AssemblyAttributes.cs

Adding this AssemblyAttributes.cs file and the code in it to the root directory of your project is not required for the code above to work. However, if you are using the internal access modifier like in this example here you will have to add it for your code to work.

using System;
using System.Runtime.CompilerServices;

[assembly: CLSCompliant(true)]

// Mark all internals to be visible to the ILGPU runtime
[assembly: InternalsVisibleTo(ILGPU.Context.RuntimeAssemblyName)]