Human Pose Detector App: Comprehensive Notes
This document provides a detailed yet easy-to-understand overview of the Human Pose Detector App, which uses computer vision to detect human body landmarks and measure body dimensions, saving the results to an Excel file. The app uses a depth camera (Intel RealSense D435) and various Python libraries to process video frames and calculate measurements.
Project Overview
The Human Pose Detector App is a GUI-based application that
Captures video using an Intel RealSense depth camera.
Detects human body landmarks using MediaPipe’s Pose solution.
Calculates body measurements (e.g., height, chest diameter, bicep length) using 3D coordinates.
Displays live video with overlaid landmarks.
Saves measurements along with a user-provided name to an Excel file.
The app uses PyQt5 for the GUI, pyrealsense2 for camera access, MediaPipe for pose detection, OpenCV for image processing, and openpyxl for Excel file handling.
Libraries Used
1. PyQt5
Purpose in Project: Provides the graphical user interface (GUI) for the app, including buttons, labels, and input fields.
Role in Project
Creates a window (MainWindow) to display live video feed.
Handles user interactions (e.g., clicking the “Capture Measurements” button).
Updates the GUI with new frames and measurement results.
Other Features
Cross-platform GUI development.
Supports widgets, layouts, signals, and slots for event-driven programming.
Can create complex applications like dialogs, menus, and toolbars.
Code Example
self.label = QLabel()
self.label.setAlignment(Qt.AlignCenter)
self.capture_btn = QPushButton(“Capture Measurements”)
self.capture_btn.clicked.connect(self.capture)
Summary: This code sets up a label to display video frames and a button to trigger measurement capture, connecting the button click to the capture method.
2. pyrealsense2
Purpose in Project: Interfaces with the Intel RealSense depth camera to capture color and depth frames.
Role in Project
Initializes the camera pipeline to stream color (RGB) and depth data.
Aligns depth and color frames for accurate 3D point calculations.
Deprojects pixel coordinates to 3D world coordinates using depth data.
Other Features
Supports multiple RealSense camera models.
Provides access to infrared streams, motion sensors, and camera calibration.
Enables advanced features like point cloud generation and tracking.
Code Example
self.pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
self.pipeline.start(config)
Summary: This code configures the RealSense camera to capture depth and color streams at 640×480 resolution and 30 FPS, then starts the pipeline.
3. OpenCV (cv2)
Purpose in Project: Handles image processing and color conversion.
Role in Project
Converts BGR (RealSense format) to RGB for MediaPipe pose processing.
Provides utilities for handling image arrays.
Other Features
Extensive image and video processing capabilities.
Supports feature detection, object tracking, and machine learning.
Used in real-time applications like facial recognition and augmented reality.
Code Example
results = self.pose.process(cv2.cvtColor(color_image, cv2.COLOR_BGR2RGB))
Summary: This code converts the color frame from BGR to RGB format, as required by MediaPipe for pose detection.
4. MediaPipe (mediapipe)
Purpose in Project: Detects human body landmarks in video frames.
Role in Project
Processes RGB frames to identify 33 pose landmarks (e.g., shoulders, elbows, hips).
Draws landmarks and connections on the video feed for visualization.
Other Features
Provides solutions for face detection, hand tracking, and object detection.
Optimized for real-time performance on various devices.
Supports 3D landmark detection and multi-person tracking.
Code Example
self.mp_pose = mp.solutions.pose
self.pose = self.mp_pose.Pose()
results = self.pose.process(cv2.cvtColor(color_image, cv2.COLOR_BGR2RGB))
Summary: This code initializes the MediaPipe Pose model and processes a frame to detect landmarks, which are later used for measurements.
5. NumPy (numpy)
Purpose in Project: Handles numerical computations and array operations.
Role in Project
Converts RealSense frame data to arrays for processing.
Performs calculations for 3D coordinates and distances.
Other Features
Supports large, multi-dimensional arrays and matrices.
Provides mathematical functions for linear algebra, statistics, and more.
Widely used in scientific computing and machine learning.
Code Example
color_image = np.asanyarray(color_frame.get_data())
Summary: This code converts a RealSense color frame into a NumPy array for further processing.
6. openpyxl
Purpose in Project: Manages Excel file creation and writing.
Role in Project
Creates or updates an Excel file (measurements.xlsx) with user names and measurements.
Organizes data in a tabular format with headers.
Other Features
Supports reading, writing, and modifying Excel files (.xlsx).
Allows formatting, charts, and formulas in spreadsheets.
Compatible with Microsoft Excel and other spreadsheet software.
Code Example
ws.append([name] + measurements)
wb.save(filename)
Summary: This code appends a row with the user’s name and measurements to the Excel worksheet and saves the file.
7. os
Purpose in Project: Checks for the existence of the Excel file.
Role in Project
Determines if measurements.xlsx exists to decide whether to create a new workbook or load an existing one.
Other Features
Provides functions for file and directory operations.
Supports path manipulation and system interactions.
Code Example
if not os.path.exists(filename)
wb = Workbook()
Summary: This code checks if the Excel file exists; if not, it creates a new workbook.
8. math
Purpose in Project: Performs mathematical calculations.
Role in Project
Calculates Euclidean distances between 3D points for body measurements.
Other Features
Provides basic mathematical functions like square root, trigonometry, and logarithms.
Lightweight and built into Python’s standard library.
Code Example
def euclidean_distance_3d(p1, p2)
return math.sqrt(sum([(a – b) ** 2 for a, b in zip(p1, p2)]))
Summary: This function calculates the 3D Euclidean distance between two points, used for body measurements like height and chest diameter.
Folder Structure and Functioning
The project consists of four Python files, each with a specific role, organized in a single directory
camera_thread.py
Contains the CameraThread class, which runs in a separate thread.
Manages RealSense camera streams, processes frames with MediaPipe, and calculates measurements.
Emits signals for updated frames and measurements to the GUI.
excel_writer.py
Defines the save_to_excel function to write measurements to an Excel file.
Handles file creation or appending data to an existing file.
main.py
Implements the MainWindow class, the main GUI application.
Initializes the camera thread, updates the video feed, and handles user input.
Coordinates saving measurements to Excel.
utils.py
Contains the euclidean_distance_3d function for calculating 3D distances.
Provides reusable utility functions for the project.
Functioning
main.py starts the application, creating a GUI and launching the camera thread.
camera_thread.py continuously captures and processes frames, sending video to the GUI and measurements when triggered.
utils.py supports calculations in camera_thread.py.
excel_writer.py is called from main.py to save measurements when the user captures data.
Data Conversion to Excel
The process of saving measurements to an Excel file is handled by excel_writer.py
Input Data
The user’s name (from the GUI input field in main.py).
A list of measurements (max reach, chest diameter, bicep diameter, thigh to feet, full height) from camera_thread.py.
Excel File Handling
If measurements.xlsx does not exist, a new workbook is created with a “Data” sheet and headers (Name, Max Reach, etc.).
If the file exists, it is loaded, and the “Data” sheet is accessed.
Data Writing
A new row is appended with the user’s name followed by the measurements.
The workbook is saved, preserving existing data.
Code Example
def save_to_excel(name, measurements, filename=”measurements.xlsx”)
headers = [‘Name’, ‘Max Reach’, ‘Chest Diameter’, ‘Bicep Diameter’, ‘Thigh to Feet’, ‘Full Height’]
if not os.path.exists(filename)
wb = Workbook()
ws = wb.active
ws.title = ‘Data’
ws.append(headers)
else
wb = load_workbook(filename)
ws = wb[‘Data’]
ws.append([name] + measurements)
wb.save(filename)
Summary: This function checks for an existing Excel file, creates one if needed, appends a row with the name and measurements, and saves the file.
Role of pyrealsense2 SDK and Depth Camera
The pyrealsense2 library is critical for interacting with the Intel RealSense depth camera (e.g., D435), which provides both color (RGB) and depth data.
Key Features Used
Pipeline: Manages camera streams (color and depth).
Alignment: Aligns depth frames to color frames to ensure pixel correspondence.
Deprojection: Converts 2D pixel coordinates with depth values to 3D world coordinates.
Intrinsics: Provides camera parameters (e.g., focal length) for accurate 3D calculations.
Functioning in the Project
Initialization
Configures the camera to stream depth (640×480, 16-bit, 30 FPS) and color (640×480, BGR, 30 FPS).
Starts the pipeline and aligns frames.
Frame Capture
Captures synchronized color and depth frames.
3D Coordinate Calculation
Uses depth data and camera intrinsics to convert landmark pixels to 3D points.
Calculates distances between points for measurements.
Visualization
The color frame, with landmarks drawn by MediaPipe, is displayed in the GUI.
Code Example
frames = self.pipeline.wait_for_frames()
aligned = self.align.process(frames)
color_frame = aligned.get_color_frame()
depth_frame = aligned.get_depth_frame()
point = rs.rs2_deproject_pixel_to_point(intr, [x, y], depth)
Summary: This code captures aligned color and depth frames, then converts a pixel (x, y) with its depth value to a 3D point using camera intrinsics.
Summary of Key Code Segments
1. Camera Thread Initialization (camera_thread.py)
def __init__(self)
super().__init__()
self.running = True
self.capture_flag = False
self.pipeline = rs.pipeline()
config = rs.config()
config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)
config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)
self.pipeline.start(config)
self.align = rs.align(rs.stream.color)
self.mp_pose = mp.solutions.pose
self.pose = self.mp_pose.Pose()
Summary: Initializes the camera thread, sets up the RealSense pipeline for depth and color streams, aligns frames, and prepares the MediaPipe Pose model.
2. Measurement Calculation (camera_thread.py)
if self.capture_flag
landmarks = results.pose_landmarks.landmark
intr = aligned.get_profile().as_video_stream_profile().get_intrinsics()
coords = []
for idx in [0, 11, 12, 13, 14, 15, 16, 23, 24, 31, 32]
lm = landmarks[idx]
x, y = int(lm.x * 640), int(lm.y * 480)
depth = depth_frame.get_distance(x, y)
point = rs.rs2_deproject_pixel_to_point(intr, [x, y], depth)
coords.append(point)
height = euclidean_distance_3d(coords[0], [(coords[9][i] + coords[10][i]) / 2 for i in range(3)])
Summary: When triggered, this code processes pose landmarks, converts their pixel coordinates to 3D points using depth data, and calculates the height by measuring the distance from the head to the average of the feet.
3. GUI Frame Update (main.py)
def update_frame(self, frame)
img = QImage(frame, frame.shape[1], frame.shape[0], QImage.Format_BGR888)
self.label.setPixmap(QPixmap.fromImage(img))
Summary: Converts a NumPy array (video frame) to a QImage and displays it in the GUI label, updating the live video feed.
4. Saving Results (main.py)
def show_results(self, measurements)
name = self.name_input.text().strip()
if name
save_to_excel(name, measurements)
self.result_label.setText(f”Measurements for {name}: {measurements}”)
else
self.result_label.setText(“Please enter a name!”)
Summary: Handles measurement results by saving them to Excel with the user’s name (if provided) and updating the GUI with the results or an error message.
Conclusion
The Human Pose Detector App integrates advanced computer vision and depth sensing to measure human body dimensions accurately. The pyrealsense2 library enables 3D data capture, MediaPipe provides robust pose detection, and PyQt5 offers a user-friendly interface. The modular structure (camera_thread.py, excel_writer.py, main.py, utils.py) ensures clear separation of concerns, making the code maintainable and scalable. Measurements are seamlessly saved to an Excel file, facilitating data storage and analysis.
