
Hi, I'm Saurav Karki, a Computer Science graduate and aspiring Machine Learning Engineer. I specialize in AI and Machine Learning, with additional experience in Full-Stack development using the MERN stack. I enjoy building intelligent systems that solve real-world problems.
History / CV
AI Trainee at Innovate Tech
Architecting core ComfyUI video and animation generation graphs, integrating Gemini 3.1 Pro Preview for enhanced storyboard/keyframe execution. Developed and scalable-refactored a Python voice synthesis microservice (Omni-TTS) featuring Whisper word-level transcript matching, concurrency pooling, and AWS S3 uploads. Investigating LTX-2 open-source video architectures.
AI Intern at Innovate Tech
Built class-based RAG retrieval pipelines using Qdrant vector databases and hybrid retrievers (Dense + BM25) for automated CV parsing and scoring. Configured automated STT/TTS data pipelines on n8n (AssemblyAI, ElevenLabs), and built a Streamlit/FastAPI medical summarization app leveraging MedGemma (27B) and Qwen-Coder. Researched JAX-based LLM fine-tuning.
AI Trainee at Bajra Technologies
Conducted machine learning research and engineering. Built full-stack integrations using Python, React, and Angular. Completed a final Capstone project: an English-to-Nepali Text-to-Speech (TTS) converter using fine-tuned neural models for enhanced synthesis.
Core Member at AI Learners Community (Herald College)
Collaborated in group projects and technical study sessions. Co-organized workshops, hackathons, and guest seminars to foster AI/ML understanding and hands-on developer experience among students.
BSc (Hons) Computer Science at Herald College Kathmandu
Acquired deep understanding of data structures, core algorithms, database systems, software engineering principles, and basic statistical computing. Completed coursework with focus on computational logic and Python data analysis.
Featured Projects
English-to-Nepali Speech Converter
A research-focused Text-to-Speech (TTS) engine trained on custom datasets. Fine-tuned with Hugging Face Transformers and VITS models. Built a Streamlit interface enabling real-time synthesis and download of spoken Nepali text.
An intelligent context-aware chatbot for Casemellow, a mobile cover e-commerce retailer. Implemented Retrieval-Augmented Generation (RAG) using Pinecone vector database and LangChain to answer user queries with precise store context.
A PyTorch translation model built from scratch. Utilized an encoder-decoder architecture with Gated Recurrent Units (GRU) and SentencePiece tokenization, evaluating translations with BLEU scores.
Movie Review Sentiment Classifier
Compared Recurrent Neural Network (RNN) and LSTM architectures using Word2Vec embeddings to analyze and categorize emotional sentiment in movie reviews.
Facial Emotion Classification
Convolutional Neural Network (CNN) fine-tuned on ResNet50 to detect facial emotions (happy, sad, neutral, surprise, fear, disgust, angry) from dynamic camera feeds.
Technical Skills
ML & Deep Learning
PyTorch, TensorFlow, Scikit-Learn, Hugging Face, LangChain, LangGraph, NLP, Computer Vision
Languages
Python, JavaScript (ES6+), SQL (PostgreSQL, MySQL), HTML5/CSS3
Web & Databases
Next.js, React, FastAPI, Node.js, Express.js, PostgreSQL, MongoDB, Pinecone
Tools & Infra
Docker, Git, Conda, Linux / Bash, Jupyter Notebook, Postman