Human Pose Detector App: Comprehensive Notes

This document provides a detailed yet easy-to-understand overview of the Human Pose Detector App, which uses computer vision to detect human body landmarks and measure body dimensions, saving the results to an Excel file. The app uses a depth camera (Intel RealSense D435) and various Python libraries to process video frames and calculate measurements.

Project Overview

The Human Pose Detector App is a GUI-based application that

Captures video using an Intel RealSense depth camera.

Detects human body landmarks using MediaPipe’s Pose solution.

Calculates body measurements (e.g., height, chest diameter, bicep length) using 3D coordinates.

Displays live video with overlaid landmarks.

Saves measurements along with a user-provided name to an Excel file.

The app uses PyQt5 for the GUI, pyrealsense2 for camera access, MediaPipe for pose detection, OpenCV for image processing, and openpyxl for Excel file handling.

Libraries Used

1. PyQt5

Purpose in Project: Provides the graphical user interface (GUI) for the app, including buttons, labels, and input fields.

Role in Project

Creates a window (MainWindow) to display live video feed.

Handles user interactions (e.g., clicking the “Capture Measurements” button).

Updates the GUI with new frames and measurement results.

Other Features

Cross-platform GUI development.

Supports widgets, layouts, signals, and slots for event-driven programming.

Can create complex applications like dialogs, menus, and toolbars.

Code Example

self.label = QLabel()

self.label.setAlignment(Qt.AlignCenter)

self.capture_btn = QPushButton(“Capture Measurements”)

self.capture_btn.clicked.connect(self.capture)

Summary: This code sets up a label to display video frames and a button to trigger measurement capture, connecting the button click to the capture method.

2. pyrealsense2

Purpose in Project: Interfaces with the Intel RealSense depth camera to capture color and depth frames.

Role in Project

Initializes the camera pipeline to stream color (RGB) and depth data.

Aligns depth and color frames for accurate 3D point calculations.

Deprojects pixel coordinates to 3D world coordinates using depth data.

Other Features

Supports multiple RealSense camera models.

Provides access to infrared streams, motion sensors, and camera calibration.

Enables advanced features like point cloud generation and tracking.

Code Example

self.pipeline = rs.pipeline()

config = rs.config()

config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)

config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

self.pipeline.start(config)

Summary: This code configures the RealSense camera to capture depth and color streams at 640×480 resolution and 30 FPS, then starts the pipeline.

3. OpenCV (cv2)

Purpose in Project: Handles image processing and color conversion.

Role in Project

Converts BGR (RealSense format) to RGB for MediaPipe pose processing.

Provides utilities for handling image arrays.

Other Features

Extensive image and video processing capabilities.

Supports feature detection, object tracking, and machine learning.

Used in real-time applications like facial recognition and augmented reality.

Code Example

results = self.pose.process(cv2.cvtColor(color_image, cv2.COLOR_BGR2RGB))

Summary: This code converts the color frame from BGR to RGB format, as required by MediaPipe for pose detection.

4. MediaPipe (mediapipe)

Purpose in Project: Detects human body landmarks in video frames.

Role in Project

Processes RGB frames to identify 33 pose landmarks (e.g., shoulders, elbows, hips).

Draws landmarks and connections on the video feed for visualization.

Other Features

Provides solutions for face detection, hand tracking, and object detection.

Optimized for real-time performance on various devices.

Supports 3D landmark detection and multi-person tracking.

Code Example

self.mp_pose = mp.solutions.pose

self.pose = self.mp_pose.Pose()

results = self.pose.process(cv2.cvtColor(color_image, cv2.COLOR_BGR2RGB))

Summary: This code initializes the MediaPipe Pose model and processes a frame to detect landmarks, which are later used for measurements.

5. NumPy (numpy)

Purpose in Project: Handles numerical computations and array operations.

Role in Project

Converts RealSense frame data to arrays for processing.

Performs calculations for 3D coordinates and distances.

Other Features

Supports large, multi-dimensional arrays and matrices.

Provides mathematical functions for linear algebra, statistics, and more.

Widely used in scientific computing and machine learning.

Code Example

color_image = np.asanyarray(color_frame.get_data())

Summary: This code converts a RealSense color frame into a NumPy array for further processing.

6. openpyxl

Purpose in Project: Manages Excel file creation and writing.

Role in Project

Creates or updates an Excel file (measurements.xlsx) with user names and measurements.

Organizes data in a tabular format with headers.

Other Features

Supports reading, writing, and modifying Excel files (.xlsx).

Allows formatting, charts, and formulas in spreadsheets.

Compatible with Microsoft Excel and other spreadsheet software.

Code Example

ws.append([name] + measurements)

wb.save(filename)

Summary: This code appends a row with the user’s name and measurements to the Excel worksheet and saves the file.

7. os

Purpose in Project: Checks for the existence of the Excel file.

Role in Project

Determines if measurements.xlsx exists to decide whether to create a new workbook or load an existing one.

Other Features

Provides functions for file and directory operations.

Supports path manipulation and system interactions.

Code Example

if not os.path.exists(filename)

wb = Workbook()

Summary: This code checks if the Excel file exists; if not, it creates a new workbook.

8. math

Purpose in Project: Performs mathematical calculations.

Role in Project

Calculates Euclidean distances between 3D points for body measurements.

Other Features

Provides basic mathematical functions like square root, trigonometry, and logarithms.

Lightweight and built into Python’s standard library.

Code Example

def euclidean_distance_3d(p1, p2)

return math.sqrt(sum([(a – b) ** 2 for a, b in zip(p1, p2)]))

Summary: This function calculates the 3D Euclidean distance between two points, used for body measurements like height and chest diameter.

Folder Structure and Functioning

The project consists of four Python files, each with a specific role, organized in a single directory

camera_thread.py

Contains the CameraThread class, which runs in a separate thread.

Manages RealSense camera streams, processes frames with MediaPipe, and calculates measurements.

Emits signals for updated frames and measurements to the GUI.

excel_writer.py

Defines the save_to_excel function to write measurements to an Excel file.

Handles file creation or appending data to an existing file.

main.py

Implements the MainWindow class, the main GUI application.

Initializes the camera thread, updates the video feed, and handles user input.

Coordinates saving measurements to Excel.

utils.py

Contains the euclidean_distance_3d function for calculating 3D distances.

Provides reusable utility functions for the project.

Functioning

main.py starts the application, creating a GUI and launching the camera thread.

camera_thread.py continuously captures and processes frames, sending video to the GUI and measurements when triggered.

utils.py supports calculations in camera_thread.py.

excel_writer.py is called from main.py to save measurements when the user captures data.

Data Conversion to Excel

The process of saving measurements to an Excel file is handled by excel_writer.py

Input Data

The user’s name (from the GUI input field in main.py).

A list of measurements (max reach, chest diameter, bicep diameter, thigh to feet, full height) from camera_thread.py.

Excel File Handling

If measurements.xlsx does not exist, a new workbook is created with a “Data” sheet and headers (Name, Max Reach, etc.).

If the file exists, it is loaded, and the “Data” sheet is accessed.

Data Writing

A new row is appended with the user’s name followed by the measurements.

The workbook is saved, preserving existing data.

Code Example

def save_to_excel(name, measurements, filename=”measurements.xlsx”)

headers = [‘Name’, ‘Max Reach’, ‘Chest Diameter’, ‘Bicep Diameter’, ‘Thigh to Feet’, ‘Full Height’]

if not os.path.exists(filename)

wb = Workbook()

ws = wb.active

ws.title = ‘Data’

ws.append(headers)

else

wb = load_workbook(filename)

ws = wb[‘Data’]

ws.append([name] + measurements)

wb.save(filename)

Summary: This function checks for an existing Excel file, creates one if needed, appends a row with the name and measurements, and saves the file.

Role of pyrealsense2 SDK and Depth Camera

The pyrealsense2 library is critical for interacting with the Intel RealSense depth camera (e.g., D435), which provides both color (RGB) and depth data.

Key Features Used

Pipeline: Manages camera streams (color and depth).

Alignment: Aligns depth frames to color frames to ensure pixel correspondence.

Deprojection: Converts 2D pixel coordinates with depth values to 3D world coordinates.

Intrinsics: Provides camera parameters (e.g., focal length) for accurate 3D calculations.

Functioning in the Project

Initialization

Configures the camera to stream depth (640×480, 16-bit, 30 FPS) and color (640×480, BGR, 30 FPS).

Starts the pipeline and aligns frames.

Frame Capture

Captures synchronized color and depth frames.

3D Coordinate Calculation

Uses depth data and camera intrinsics to convert landmark pixels to 3D points.

Calculates distances between points for measurements.

Visualization

The color frame, with landmarks drawn by MediaPipe, is displayed in the GUI.

Code Example

frames = self.pipeline.wait_for_frames()

aligned = self.align.process(frames)

color_frame = aligned.get_color_frame()

depth_frame = aligned.get_depth_frame()

point = rs.rs2_deproject_pixel_to_point(intr, [x, y], depth)

Summary: This code captures aligned color and depth frames, then converts a pixel (x, y) with its depth value to a 3D point using camera intrinsics.

Summary of Key Code Segments

1. Camera Thread Initialization (camera_thread.py)

def __init__(self)

super().__init__()

self.running = True

self.capture_flag = False

self.pipeline = rs.pipeline()

config = rs.config()

config.enable_stream(rs.stream.depth, 640, 480, rs.format.z16, 30)

config.enable_stream(rs.stream.color, 640, 480, rs.format.bgr8, 30)

self.pipeline.start(config)

self.align = rs.align(rs.stream.color)

self.mp_pose = mp.solutions.pose

self.pose = self.mp_pose.Pose()

Summary: Initializes the camera thread, sets up the RealSense pipeline for depth and color streams, aligns frames, and prepares the MediaPipe Pose model.

2. Measurement Calculation (camera_thread.py)

if self.capture_flag

landmarks = results.pose_landmarks.landmark

intr = aligned.get_profile().as_video_stream_profile().get_intrinsics()

coords = []

for idx in [0, 11, 12, 13, 14, 15, 16, 23, 24, 31, 32]

lm = landmarks[idx]

x, y = int(lm.x * 640), int(lm.y * 480)

depth = depth_frame.get_distance(x, y)

point = rs.rs2_deproject_pixel_to_point(intr, [x, y], depth)

coords.append(point)

height = euclidean_distance_3d(coords[0], [(coords[9][i] + coords[10][i]) / 2 for i in range(3)])

Summary: When triggered, this code processes pose landmarks, converts their pixel coordinates to 3D points using depth data, and calculates the height by measuring the distance from the head to the average of the feet.

3. GUI Frame Update (main.py)

def update_frame(self, frame)

img = QImage(frame, frame.shape[1], frame.shape[0], QImage.Format_BGR888)

self.label.setPixmap(QPixmap.fromImage(img))

Summary: Converts a NumPy array (video frame) to a QImage and displays it in the GUI label, updating the live video feed.

4. Saving Results (main.py)

def show_results(self, measurements)

name = self.name_input.text().strip()

if name

save_to_excel(name, measurements)

self.result_label.setText(f”Measurements for {name}: {measurements}”)

else

self.result_label.setText(“Please enter a name!”)

Summary: Handles measurement results by saving them to Excel with the user’s name (if provided) and updating the GUI with the results or an error message.

Conclusion

The Human Pose Detector App integrates advanced computer vision and depth sensing to measure human body dimensions accurately. The pyrealsense2 library enables 3D data capture, MediaPipe provides robust pose detection, and PyQt5 offers a user-friendly interface. The modular structure (camera_thread.py, excel_writer.py, main.py, utils.py) ensures clear separation of concerns, making the code maintainable and scalable. Measurements are seamlessly saved to an Excel file, facilitating data storage and analysis.