Moncef Benaicha

I’m Moncef, an AI specialist focused on NLP, speech recognition, and agentic AI systems. I work at the intersection of software engineering and applied machine learning, turning advances in AI research into practical, production-ready systems for real-world use.

Currently, I’m a Senior AI Engineer at KI Performance, where I take end-to-end ownership of AI projects for major European enterprises, from scoping and solution design to production rollout and delivery. In this role, I build and deploy agentic systems using both commercial LLMs and domain-adapted open-weight models. I also develop smaller fine-tuned models to improve task performance while keeping behavior predictable and controllable.

Before moving into machine learning, I worked as a software engineer focused on backend development in Python and C++. I built and maintained microservices using REST APIs and gRPC for large-scale, mission-critical systems across finance, logistics, and automotive. That background continues to shape how I design AI systems today: with clean interfaces, strong engineering fundamentals, and production constraints in mind.

Jun. 2024 – Present
Senior AI Engineer · KI Performance GmbH Full-time
As a Senior AI Engineer at KI Performance GmbH, I deliver production-grade AI platforms and applications for major European enterprises across Automotive, Energy, Electronics, Semiconductors, and HR. Across 15+ engagements, I’ve owned the full lifecycle from discovery to deployment, aligning stakeholders in pre-sales, evaluating existing tech stacks, and defining cloud integration strategies. My work includes architecting and shipping multi-agent systems with LLMs/SLMs and STT/TTS for workflow automation and conversational services, plus recommendation pipelines that pair fine-tuned embedding retrieval with LLM re-ranking and response generation. I present architecture alternatives with clear trade-offs (latency, cost, quality, security) and drive delivery by mentoring junior engineers and coordinating closely with frontend and infrastructure teams.
- Multi-agent systems
- LangChain
- LangGraph
- HF Transformers
- MLFflow
- Azure
- HF Accelerate
- vLLM
- NVIDIA Triton
- Semantic search
Dec. 2023 – Apr. 2024
Machine Learning Engineer · TEQ Capital Full-time
At TEQ Capital, I built key components of an LLM-assisted analytics stack for financial teams, spanning ingestion, retrieval, and multimodal document understanding. I implemented Dagster-based Python pipelines to ingest and normalize documents, audio, web content, and external provider data via REST/GraphQL/WebSockets. I then designed a question graph and integrated GPT-4 Turbo for financial question answering, driving ~80% workload reduction for analysts. To improve reliability, I added a retrieval-augmented generation (RAG) layer with re-ranking to ground answers in the latest financial data, and built a document understanding pipeline that interprets not only text but also charts and tables for higher-quality responses.
- Python
- Dagster
- GraphQL
- WebSocket
- LLMs
- RAG
- Re-ranking
- AWS
Nov. 2021 – Jun. 2023
ML Research Assistant - ASR & NLP · Fraunhofer IAIS Part-time
At Fraunhofer IAIS, I worked on research at the intersection of ASR and NLP, with a focus on multilingual and speech-based information extraction. I evaluated language representation models (BERT, XLM-R, XLM-V) for Named Entity Recognition (NER), emphasizing cross-lingual transfer performance. To enable rapid iteration on speech systems, I built and maintained a data pipeline for ASR training and experimentation. Building on this foundation, I developed multiple approaches to Spoken Named Entity Recognition, both cascading (ASR → NER) and end-to-end, leveraging Wav2Vec2 XLS-R and Whisper, and investigated transfer-learning strategies across low- and high-resource languages to improve multilingual generalization.
- BERT
- XLM-R
- XLM-V
- NER
- Cross-lingual Transfer Learning
- ASR
- Speech recognition
- Wav2Vec2 XLS-R
- Whisper
- DeepSpeed
- 4-bit quantization
- QLoRA
Nov. 2020 – Mar. 2023
Software Engineer · Taliox Part-time
At Taliox, I delivered Python backend products from design through deployment, building web applications and REST APIs with Django, FastAPI, and Flask. I automated recurring workflows with Python and Shell scripts for task automation, data collection, and data pre/post-processing, and I built browser automation bots using Selenium and BeautifulSoup (BS4) to support operational needs. To ensure repeatable releases and developer-friendly setups, I containerized applications with Docker to create consistent cross-platform development environments and streamline deployments.
- Python
- Django
- FastAPI
- Flask
- REST API
- gRPC
- Websocket
- Selenium
- BeautifulSoup (BS4)
- Docker
Dec 2019 – Nov. 2020
ML Research Assistant - ASR & NLP · RWTH Human Language Technology Department - ASR Group Part-time
At RWTH’s Human Language Technology Department (ASR Group), I investigated how different feature extraction strategies influence ASR acoustic model performance. I also replicated key experiments from the wav2vec line of research to validate published findings, strengthening my foundation in speech representation learning and rigorous experimental evaluation.
- ASR
- Acoustic modeling
- Feature extraction
- Python
- C++
View Full Résumé

Spoken NER (Cross-Lingual Transfer Learning)
Spoken Named Entity Recognition (NER) experiments for the paper “Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems”, using Wav2Vec2-XLS-R models and transfer learning across English, German, and Dutch on Common Voice-derived data.
- Python
- PyTorch
- HF Transformers
- Wav2Vec2-XLS-R
- HF Accelerate

About

Experience

Senior AI Engineer · KI Performance GmbH Full-time

Machine Learning Engineer · TEQ Capital Full-time

ML Research Assistant - ASR & NLP · Fraunhofer IAIS Part-time

Software Engineer · Taliox Part-time

ML Research Assistant - ASR & NLP · RWTH Human Language Technology Department - ASR Group Part-time

Projects

Spoken NER (Cross-Lingual Transfer Learning)