3D Reconstruction via Camera-Lidar (2D) Fusion for Mobile Robots: A Gaussian Splatting Approach
Ajay Kumar Sandula, *Shriram Damodaran, *Suhas Nagaraj, Debasish Ghose, Pradipta Biswas
IEEE International Conference on Robotics and Automation (ICRA) 2025
* are equally contributing authors
Our work presents a novel 3D reconstruction-based SLAM (Simultaneous Localization and Mapping) approach leveraging multimodal sensory fusion from a camera and 2D Lidar, enhanced by the Gaussian Splatting technique. Traditional monocular SLAM methods face challenges in dynamic environments, where pure vision-based localization struggles with accuracy. By integrating Lidar-based localization for precise positioning and Gaussian Splatting for robust environmental mapping, our approach achieves higher accuracy, reduced computational load, and improved FPS compared to existing SLAM methods. Experimental results in real-world and simulated environments demonstrate significant performance improvements in mapping accuracy and scalability for mobile robots.
This project involves developing an Automated Guided Vehicle (AGV) using ROS2 and OpenCV on a Turtlebot3 Waffle platform. The AGV is designed to autonomously follow lanes, detect stop signs, and identify as well as avoid dynamic obstacles. The implementation is tested both in simulation using Gazebo and on actual hardware, demonstrating the vehicle's capability to navigate complex environments safely and efficiently.
This project involves developing a ROS package that enables a TurtleBot to autonomously navigate through a maze using Aruco markers for localization. As the robot traverses the maze, it detects and reports objects present in the environment, integrating perception and navigation capabilities to achieve efficient maze traversal.
This project enables a robot to autonomously navigate its environment by integrating SLAM (Simultaneous Localization and Mapping) with ROS2 action-client nodes. The robot identifies specific objects, such as batteries, using camera input to determine navigation waypoints. By referencing the TF (transform) tree, the robot accurately assesses its current position and the locations of these waypoints, facilitating precise movement within its surroundings.
This project focuses on the kinematic analysis and simulation of the ABB IRB 1600 industrial robot, specifically applied to a CNC tending process. It includes a detailed study of the robot's forward and inverse kinematics, as well as the development of a Gazebo simulation to visualize and test the robot's movements within a virtual environment. The ABB IRB 1600 is known for its high performance in material handling and machine tending applications, offering up to 50% shorter cycle times compared to competing robots.
This project enhances autonomous robot navigation by integrating Natural Language Processing (NLP) to interpret and execute human language commands. Utilizing a fine-tuned T5-Small model, the system translates natural language instructions into structured navigation commands, enabling a TurtleBot3 robot to perform complex, multi-step tasks. The approach employs LoRA-based adaptation for efficient model fine-tuning and is validated in a ROS2-based simulation environment using Gazebo and RViz. A synthetic dataset comprising over 24,000 navigation instructions supports the training process.
This project implements autonomous navigation and obstacle avoidance for a TurtleBot3 robot using Deep Q-Networks (DQN) within a ROS2 and Gazebo simulation environment. The robot learns to navigate complex environments by training a neural network to predict optimal actions based on its current state, enabling it to reach designated goals while effectively avoiding obstacles.
7. MULTI⦿VIZ: An Open-Source Tool for Multi-Robot Coordination
MULTI⦿VIZ is a graphical user interface (GUI) designed to enhance multi-robot coordination and visualization. Built using the ROS2 framework, it provides real-time monitoring, teleoperation, and control of multiple robots with features such as voice commands, emergency stop functionality, and sensor data visualization (LiDAR, odometry, and camera feeds). The tool addresses challenges in managing multi-robot systems by offering an intuitive and scalable interface for research and practical deployment. Developed as an open-source project, MULTI⦿VIZ aims to promote collaboration and accessibility within the robotics community. This project is still under development.
This project focuses on implementing the Real-Time Rapidly-exploring Random Tree Star (RT-RRT*) algorithm to enhance mobile robot navigation in dynamic environments. RT-RRT* is an extension of the RRT* algorithm, designed to efficiently update paths in response to changes, enabling robots to adapt their trajectories in real-time as obstacles move or new information becomes available. This approach is particularly beneficial for applications requiring continuous path optimization in unpredictable settings.
This project focuses on implementing the A* algorithm for path planning in a differential drive (non-holonomic) robot. The A* algorithm is utilized to compute an optimal path from a start to a goal position, considering the robot's kinematic constraints. This approach ensures that the planned path is feasible for a robot with differential drive dynamics, enabling efficient navigation in environments with obstacles.
This project applies Dijkstra's algorithm to navigate a point robot through a defined environment with static obstacles. Dijkstra's algorithm systematically explores all possible paths from a starting point to determine the shortest path to a target location, ensuring the robot efficiently reaches its destination while avoiding obstacles.
This project involves implementing the A* algorithm for mobile robot navigation. The A* algorithm is a widely used pathfinding and graph traversal technique that calculates the shortest path from a start point to a goal point, considering both the cost to reach a node and an estimated cost to the goal. In this context, the algorithm is adapted to account for the robot's kinematic constraints and environmental obstacles, ensuring efficient and feasible path planning for the mobile robot.
This project addresses the classic 8-puzzle problem by implementing a solution using the Breadth-First Search (BFS) algorithm. The 8-puzzle consists of a 3x3 grid with numbered tiles and one empty space; the objective is to rearrange the tiles from a given initial configuration to a specified goal configuration by sliding tiles into the empty space. The BFS algorithm explores all possible moves level by level, ensuring that the shortest sequence of moves leading to the solution is found. This approach guarantees an optimal solution, effectively navigating through the state space of the puzzle to achieve the desired arrangement.
Here, two path planning algorithms—Bidirectional A* and Potential Field—are implemented and compared.
Bidirectional A*: This algorithm performs simultaneous searches from both the start and goal nodes, aiming to meet in the middle. This approach can significantly reduce search time compared to unidirectional A*, especially in large search spaces. However, its performance can be influenced by the placement of obstacles and the heuristic used.
Potential Field Method: This approach treats the robot as a point under the influence of an artificial potential field, where the goal exerts an attractive force and obstacles exert repulsive forces. The robot navigates by following the resultant force vector. While this method offers smooth and continuous paths, it can suffer from issues like local minima, where the robot gets trapped away from the goal.
Comparison: Bidirectional A* is generally more reliable in finding the shortest path, as it systematically explores the search space based on a heuristic. In contrast, the Potential Field method provides smoother paths but may not always find the shortest route and can get stuck in local minima. The choice between these algorithms depends on the specific requirements of the application, such as the need for path optimality versus smoothness and computational resources.
This project focuses on developing an image processing pipeline to stitch multiple images into a seamless panoramic photograph. By identifying overlapping regions and aligning key features across adjacent images, the algorithm merges them to create a wide-angle composite view. This technique is particularly useful for capturing expansive scenes that exceed the field of view of a single photograph, enabling the creation of high-resolution panoramic images.
This project focuses on tracking an object's trajectory within a video sequence. By employing computer vision techniques, the system identifies the object of interest in each frame and maps its movement throughout the video. This involves detecting the object's position, applying masking to isolate it from the background, and utilizing curve fitting to model its trajectory accurately. Such tracking is essential for applications like motion analysis, surveillance, and human-computer interaction, where understanding the path of moving objects is crucial.
This project focuses on developing a video processing pipeline to detect the four corners of a document within a video stream. The approach involves several key steps:
Edge Detection: Applying edge detection algorithms to identify the boundaries of the document within each video frame.
Hough Transform: Utilizing the Hough Transform to detect straight lines corresponding to the document's edges.
Corner Detection: Identifying the intersection points of these lines to determine the precise locations of the document's corners.
This method enables accurate real-time detection of document corners, facilitating tasks such as perspective correction and document scanning from video inputs.
This project involves generating disparity and depth maps from stereo image pairs. By analyzing the differences between two images captured from slightly different viewpoints, the system calculates the disparity for each pixel, which corresponds to the horizontal shift between the images. This disparity information is then used to compute depth maps, providing a three-dimensional representation of the scene. Such techniques are fundamental in applications like 3D reconstruction, robotic vision, and autonomous navigation, where understanding the spatial arrangement of objects is crucial.
18. Limerick Poetry Checker
This project evaluates whether a given poem follows the AABBA rhyme scheme of a traditional limerick.
Key Features:
Rhyme Scheme Analysis: Uses the CMU Pronouncing Dictionary (cmudict) to verify end-rhymes.
Advanced Tokenization: Handles apostrophes and irregular word endings for more accurate rhyme detection.
Python-based Implementation: Built using NLTK, with a structured dataset for testing and validation.
The system provides an automated way to check poetic structure, useful for linguistic research and poetry validation.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber & Dr. Naomi Feldman. Due to course restrictions, code cannot be shared.
19. Bigram Language Model
This project builds a statistical bigram language model that predicts word probabilities based on preceding words. Trained on the Brown corpus, it employs various smoothing techniques to improve prediction accuracy and handle unseen word pairs.
Key Features:
Bigram Probability Calculation: Computes word likelihoods using frequency-based probabilities.
Multiple Smoothing Techniques:
Maximum Likelihood Estimation (MLE): Uses raw frequency counts to compute probabilities, but assigns zero probability to unseen bigrams, making it unsuitable for sparse data.
Laplace (Add-One) Smoothing: Adds one to all bigram counts, preventing zero probabilities but often overestimates the likelihood of rare words.
Dirichlet Smoothing: Extends Laplace smoothing by introducing a hyperparameter to balance adjustments, allowing finer control over probability distribution.
Jelinek-Mercer Smoothing: Uses a linear interpolation between bigram and unigram probabilities, ensuring that unseen bigrams still receive probability mass based on overall word frequency.
Kneser-Ney Smoothing: A backoff model that not only adjusts probabilities based on frequency but also considers the diversity of contexts in which a word appears, making it particularly effective for handling rare words.
Perplexity Evaluation: Assesses model performance on test corpora (Treebank, Gutenberg).
Text Generation: Generates sample text based on trained bigram probabilities.
This model enhances NLP applications by improving word prediction and language structure analysis.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber & Dr. Naomi Feldman. Due to course restrictions, code cannot be shared.
20. Speech Vowel Classifier
This project develops a logistic regression-based vowel classifier using MFCC features extracted from speech audio. Built with PyTorch, it classifies American English vowels from the Hillenbrand dataset.
Key Features:
MFCC Feature Extraction: Mid-frame Mel-Frequency Cepstral Coefficients (MFCCs) capture speech characteristics.
Logistic Regression Model: A simple PyTorch-based model trained on vowel pairs (e.g., ih vs. eh).
Dataset Handling: Processes Hillenbrand corpus with structured train/test splits.
Evaluation: Tracks accuracy across multiple epochs, optimizing via stochastic gradient descent.
The classifier achieves high accuracy on vowel differentiation tasks, offering insights for speech recognition and linguistic analysis.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber & Dr. Naomi Feldman. Due to course restrictions, code cannot be shared.
21. Feature-Enhanced Buzzer for QA
This project improves a QA "Buzzer" system by incorporating advanced feature engineering techniques to enhance confidence estimation and decision-making. A logistic regression model now leverages:
Length-Based Features: Guess and question snippet lengths help infer certainty.
Guess History Tracking: Identifies repeated guesses and their stability.
Search Overlap: Measures word overlap to assess relevance.
Dictionary Validation: Evaluates word legitimacy using NLTK.
These enhancements boosted accuracy (~0.73 → ~0.77), buzz ratio (~0.32 → ~0.3577), and overall performance. The system supports easy integration of new features for further improvements.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber. Due to course restrictions, code cannot be shared.
22. Deep Averaging Networks using PyTorch
This project implements Deep Averaging Networks (DANs) in PyTorch for text classification, specifically in quizbowl question answering. DANs create text representations by averaging word embeddings and passing them through a feedforward neural network for classification.
Key Features:
Text Representation: Averages word embeddings to create a fixed-size input vector, enabling robust handling of varying-length texts.
Feedforward Neural Network: Utilizes fully connected layers with ReLU activations and dropout to improve generalization.
Efficient Training & Evaluation: Implements gradient clipping and tracks performance using accuracy, precision, recall, and loss visualization.
Dataset Flexibility: Supports multiple input formats (JSON, CSV, gzipped JSON) with customizable vocabulary handling.
This lightweight yet powerful model is ideal for NLP applications requiring efficient text classification.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber & Dr. Naomi Feldman. Due to course restrictions, code cannot be shared.
23. Shift Reduce Dependency Parser
This project implements a transition-based dependency parser using the shift-reduce technique for syntactic analysis. It constructs dependency trees by incrementally processing words through a stack and buffer system, predicting grammatical relationships between words.
Key Features:
Shift-Reduce Parsing: Uses Shift, Left-Arc, and Right-Arc transitions to build dependency structures.
Oracle-Guided Learning: Trains a Maximum Entropy (MaxEnt) classifier on annotated NLTK dependency treebank data.
Feature-Based Prediction: Utilizes POS tags, word indices, stack/buffer sizes, and transition history for decision-making.
Efficient Training & Evaluation: Achieves 91.43% classification accuracy for transition prediction.
This parser provides a lightweight yet powerful approach to syntactic parsing, useful for NLP applications such as information extraction and machine translation.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber & Dr. Naomi Feldman. Due to course restrictions, code cannot be shared.
This project fine-tunes DistilBERT using Low-Rank Adaptation (LoRA) to efficiently predict whether an answer in a quizbowl setting is correct. LoRA optimizes a subset of transformer weights, enabling memory-efficient adaptation while preserving the pre-trained model’s knowledge.
Key Features:
LoRA Fine-tuning: Enhances DistilBERT with minimal trainable parameters, improving efficiency.
Gaussian Process Regressor (GPR): Assists in predicting model confidence for answer correctness.
Feature Engineering: Integrates textual and statistical features for better decision-making.
Efficient Training & Evaluation: Achieves high accuracy with reduced computational overhead.
This model improves buzzer decision-making in quizbowl competitions and showcases lightweight transformer adaptation for classification tasks.
Developed as part of CMSC723: NLP under Dr. Jordan Boyd-Graber. Due to course restrictions, code cannot be shared.
This project examines and compares two nonlinear control strategies—Feedback Linearization and Adaptive Sliding Mode Control—applied to a quadrotor helicopter.
Feedback Linearization: This method aims to transform the nonlinear dynamics of the quadrotor into an equivalent linear system by canceling out the nonlinearities through precise modeling. While it can achieve accurate control under ideal conditions, its performance is highly sensitive to modeling inaccuracies and external disturbances, as it relies on exact system parameters.
Adaptive Sliding Mode Control: This approach introduces a robust control law that forces the system's state to reach and maintain a predefined sliding surface, ensuring desired dynamics. The adaptive component adjusts to uncertainties and disturbances in real-time, enhancing resilience against model inaccuracies and external perturbations. This method is particularly effective in handling the underactuated nature of quadrotors and provides robust performance even in the presence of sensor noise and unmodeled dynamics.
Comparison: The project involves implementing both control strategies and evaluating their performance through simulations. The analysis focuses on aspects such as trajectory tracking accuracy, sensitivity to disturbances, and robustness against modeling uncertainties. Preliminary findings suggest that while feedback linearization offers precise control in well-modeled scenarios, adaptive sliding mode control demonstrates superior robustness and adaptability in more uncertain and dynamic environments.
This project involves designing and implementing Linear Quadratic Regulator (LQR) and Linear Quadratic Gaussian (LQG) controllers for a cart system with two suspended loads. The system consists of a cart moving along a one-dimensional track with two pendulums attached, each representing a suspended load.
Linear Quadratic Regulator (LQR): This optimal control strategy aims to regulate the system's state by minimizing a cost function that balances state deviations and control efforts. The LQR controller is designed based on a linearized model of the system, providing state feedback gains that ensure desired performance and stability.
Linear Quadratic Gaussian (LQG) Controller: Building upon the LQR framework, the LQG controller incorporates a Kalman filter to estimate the system's state in the presence of noise and uncertainties. This combination allows for optimal control even when all state variables are not directly measurable, enhancing the system's robustness to disturbances and modeling inaccuracies.
The project includes MATLAB simulations demonstrating the effectiveness of both controllers in achieving precise positioning of the cart while suppressing oscillations of the suspended loads. Comparative analyses highlight the trade-offs between control performance and complexity, providing insights into the practical implementation of LQR and LQG controllers in systems with multiple suspended loads.