NVIDIA, a provider of programmable GPUs, has introduced a new CUDA Toolkit for developing parallel applications using its GPUs.

The CUDA 4.0 Toolkit is designed to make parallel programming easier, and enables developers to port their applications to GPUs, said the company.

The new offering features GPUDirect 2.0 technology that offers support for peer-to-peer communication among GPUs within a single server or workstation, enabling multi-GPU programming and application performance.

The Unified Virtual Addressing (UVA) feature provides a single merged-memory address space for the main system memory and the GPU memories, enabling parallel programming.

The Thrust C++ Template Performance Primitives Libraries feature provides a collection of open source C++ parallel algorithms and data structures that ease programming for C++ developers.

The CUDA 4.0 architecture also includes: MPI Integration with CUDA Applications; Multi-thread Sharing of GPUs; Multi-GPU Sharing by Single CPU Thread; and New NPP Image and Computer Vision Library.