I haven't written any CUDA code, but I just took a Computer Architecture course and talked with my professor after class about GPGPU because it fascinated me too. The problem is, C code (or any code, for that matter) when written for a CPU is designed to run in either a linear (if this then that...) or circular fashion (loops). It's often very branch-heavy and requires constant cache access as data is moved into and out of registers. GPUs don't have caches (that's a lie, but for all intensive purposes when comparing to CPUs true) and operate on massively parallel data loads. If you've ever programmed with SIMD it's similar to that. The GPU is good for doing repeated floating point operations of large parallel (ie, not interdependent) data sets. It's terrible at making decisions (branches). This is why GPUs use memory that's incredibly high in bandwidth but not very latency sensitive. The data the GPU is reading is assumed to be aligned in memory so lines can be pulled out in sequence and literally streams data to the GPU. Once the GPU has performed it's calculations on a data set, the data goes straight back to the memory. If a second operation then needs to be done on the same data set, the process is repeated as the entire set is again streamed through the GPU.
Now, like I said I don't have any experience actually programming these things nor any experience working with images, but my understanding of image processing is that you're doing the exact same operation on each individual pixel. GPGPU seams perfect for your application, but it will certainly require a few steps backwards before you can realize the massive performance benefits.