Resolve CUDA error: no kernel image is available for execution on the device

1. Computer Configuration GPU 3080 arithmetic 8.6 CUDA 11.1 CUDNN 8.2.0 conda 4.9.2 python 3.8.5 2. Description of the problem First in pytroch website Use pip command according to computer configuration pip3 install torch==1.8.1+cu111 torchvision==0.9.1+cu111 torchaudio==0.8.1 -f https://downlUTF-8...

Posted by nuxy on Tue, 25 May 2021 01:39:52 +0930

Common implementation of the CUDA parallel protocol problem and nested recursive implementation (a small problem solution with additional nested recursion)

Serial Protocol Problem problem analysis Let's start by analyzing a simple protocol problem, assuming we have a simple array at this point size=8; int array[size]={1, 2, 3, 4, 5, 6, 7, 8}; Our goal is to calculate the sum of arrays, that is, array[0]+array[1]+...+array[9], which can generally UTF-8...

Posted by osnewbie2004 on Thu, 03 Jun 2021 01:59:08 +0930

Texture texture of CUDA

CUDA array and device memory are allocated from the same physical memory pool, but the former makes a local optimization for 2D and 3D, and the graphics driver uses this layout to save texture, so that the hardware can operate on 2D or 3D element blocks instead of 1D addressing. For applicationUTF-8...

Posted by mikeissa on Fri, 11 Jun 2021 03:29:42 +0930

Deployment of location composition and target detection environment on embedded JetSon TX2 (including configuration of RTAB-MAP, Object Detection API, RealSense, IMU, ROS, CUDA, Tensorflow, etc.)

This week, in order to conclude the project, we reproduced last year's positioning composition and target detection environment deployment on a brand-new TX2 development board. In fact, the content of each part has been mentioned piecemeal in the previous blog of the blogger. Here's a new blog UTF-8...

Posted by jeopardy_68 on Sat, 10 Jul 2021 04:14:20 +0930

Spatial coordinate system transformation realized by CUDA Thrust

Spatial coordinate system transformation realized by CUDA Thrust Catalogue of series articles The space target orbit parallel computing technology based on CUDA has four sections, of which the contents of the first and second sections are as follows 1. Orbit calculation requirements and missionUTF-8...

Posted by ccravens on Sun, 02 Jan 2022 03:17:57 +1030

I and computer vision - [CUDA] - [multi stream of CUDA under CPU multithreading]

Firstly, the problem lies in the multithreading under the cpu. When you want to call the same CUDA kernel function in multiple threads, you will find that the efficiency is very low. After verification, no matter how many threads you have, CUDA always puts the kernel function in the thread intoUTF-8...

Posted by Mouse on Mon, 03 Jan 2022 22:41:40 +1030

CUDA C programming combined global memory access

using shared memory can also help avoid access to unconsolidated global memory. Matrix transpose is a typical example: read operations are merged naturally, but write operations are accessed according to cross access. Cross access is the worst access mode in global memory because it wastes busUTF-8...

Posted by richardk1 on Wed, 05 Jan 2022 06:08:07 +1030

The CUDA kernel function does not execute or report errors

CUDA kernel function does not execute and does not report errors Recently, a problem was found when using CUDA. Sometimes the kernel kernel function neither executes nor reports an error. And sometimes the program can run, and the result is correct; Sometimes it is not executed and no error is UTF-8...

Posted by bluebutterflyofyourmind on Wed, 09 Feb 2022 18:31:38 +1030

shared memory optimizes merge sort of gpu

shared memory optimization for cuda merge sort Before gpu merge sort shared memory is not used to optimize the program. After careful analysis, it was found that previous merges required multiple reads of local memory, and some optimization might be possible if loaded into shared memory. The meUTF-8...

Posted by mudasir on Tue, 08 Mar 2022 03:18:44 +1030

Subdivision calculation between multiple GPU s programmed by CUDA C

Allocate memory on multiple devices Before assigning computing tasks from the host to the device, you need to determine how many GPU s are available in the current: int ngpus; cudaGetDeviceCount(&ngpus); printf("CUDA-capable devices: %i\n",ngpus); Once the number of GPU s has been determined, iUTF-8...

Posted by blackcow on Sat, 12 Mar 2022 23:10:15 +1030