Person:
- Hi, I want to run deeplearning-based object detection model in my computer without GPU to detect objects in a video. I’d like to use python, opencv and yolov4 model. When processing the video, please show the process bar. And save the processed video to a new file. Please give the instructions to setup the environment and give the python code.
ChatGPT:
- To run deep learning-based object detection on a video without a GPU using Python, OpenCV, and the YOLOv4 model, follow these steps:
- Set up the environment:
First, make sure you have Python installed on your computer. Then, install the required libraries using pip:
pip install opencv-python opencv-python-headless numpy tqdm
- Download the YOLOv4 model files:
Download the pre-trained YOLOv4 model files - ‘yolov4.cfg’ and ‘yolov4.weights’ from the official repository: https://github.com/AlexeyAB/darknet
- Download the COCO dataset class names:
Download the ‘coco.names’ file, which contains the class names for the COCO dataset, from this link: https://github.com/AlexeyAB/darknet/blob/master/cfg/coco.names
- Run object detection on a video with OpenCV, YOLOv4, and display the progress bar:
Here’s a sample Python script to run object detection on a video using OpenCV, YOLOv4, and display the progress bar using the tqdm library:
import cv2
import numpy as np
from tqdm import tqdm
# Load the YOLOv4 model
model_config = 'path/to/yolov4.cfg'
model_weights = 'path/to/yolov4.weights'
net = cv2.dnn.readNet(model_config, model_weights)
net.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
net.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
# Load class names
with open('path/to/coco.names', 'r') as f:
class_names = [line.strip() for line in f.readlines()]
# Load the input video
input_video = 'path/to/input_video.mp4'
cap = cv2.VideoCapture(input_video)
# Get video properties
width = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
height = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))
# Create the output video file
output_video = 'path/to/output_video.mp4'
fourcc = cv2.VideoWriter_fourcc(*'mp4v')
out = cv2.VideoWriter(output_video, fourcc, fps, (width, height))
# Process the video
conf_threshold = 0.5
nms_threshold = 0.4
for _ in tqdm(range(total_frames), desc="Processing video"):
ret, frame = cap.read()
if not ret:
break
blob = cv2.dnn.blobFromImage(frame, 1 / 255.0, (416, 416), swapRB=True, crop=False)
net.setInput(blob)
layer_names = net.getLayerNames()
output_layer_names = [layer_names[i[0] - 1] for i in net.getUnconnectedOutLayers()]
layer_outputs = net.forward(output_layer_names)
boxes = []
confidences = []
class_ids = []
for output in layer_outputs:
for detection in output:
scores = detection[5:]
class_id = np.argmax(scores)
confidence = scores[class_id]
if confidence > conf_threshold:
box = detection[0:4] * np.array([width, height, width, height])
centerX, centerY, w, h = box.astype("int")
x = int(centerX - (w / 2))
y = int(centerY - (h / 2))
boxes.append([x,
Person:
- continue the python code
ChatGPT:
- Here’s the continuation of the Python code to process the video, detect objects, and save the output:
y, int(w), int(h)])
confidences.append(float(confidence))
class_ids.append(class_id)
indices = cv2.dnn.NMSBoxes(boxes, confidences, conf_threshold, nms_threshold)
if len(indices) > 0:
for i in indices.flatten():
x, y, w, h = boxes[i]
label = f"{class_names[class_ids[i]]}: {confidences[i]:.2f}"
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 2)
cv2.putText(frame, label, (x, y - 5), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
# Save the processed frame to the output video
out.write(frame)
# Release resources
cap.release()
out.release()
cv2.destroyAllWindows()
This code will process the input video frame by frame, detect objects using YOLOv4, draw bounding boxes around the detected objects, display class names along with their confidence scores, and save the processed video to the specified output path. The progress bar will be shown using the tqdm library.
Remember to replace ‘path/to/input_video.mp4’, ‘path/to/output_video.mp4’, ‘path/to/yolov4.cfg’, ‘path/to/yolov4.weights’, and ‘path/to/coco.names’ with the appropriate paths on your system.
Keep in mind that running deep learning-based object detection on a CPU will be significantly slower than using a GPU. If you need to process videos in real-time or large video files, you might want to consider using a GPU or optimizing the code for better performance.