Moncef Benaicha

Senior AI Engineer

NLP - ASR - LLM Agents

About

I’m Moncef, specializing in NLP, speech (ASR), and conversational and agentic AI systems. I work at the intersection of software engineering and applied machine learning, turning advances in AI research into practical, production-ready systems for real-world use.

Currently, I’m a Senior AI Engineer at KI Performance, where I take end-to-end ownership of AI projects for major European enterprises, from scoping and solution design to production rollout and handover. In this role, I build and deploy conversational systems and agentic applications using both off-the-shelf LLMs and domain-adapted open-weight models. I also develop fine-tuned smaller models to improve task performance while keeping behavior predictable and controllable.

Before moving into machine learning, I worked as a software engineer focused on backend development in Python and C++. I built and maintained microservices using REST APIs and gRPC for large-scale, mission-critical systems across finance, logistics, and automotive. That background continues to shape how I design AI systems today: with clean interfaces, strong engineering fundamentals, and production constraints in mind.

Experience

  1. Jun. 2024 – Present

    Senior AI Engineer · KI Performance GmbH Full-time

    As a Senior AI Engineer at KI Performance GmbH, I deliver production-grade AI platforms and applications for major European enterprises across Automotive, Energy, Electronics, Semiconductors, and HR. Across 15+ engagements, I’ve owned the full lifecycle from discovery to deployment, aligning stakeholders in pre-sales, evaluating existing tech stacks, and defining cloud integration strategies. My work includes architecting and shipping multi-agent systems with LLMs/SLMs and STT/TTS for workflow automation and conversational services, plus recommendation pipelines that pair fine-tuned embedding retrieval with LLM re-ranking and response generation. I present architecture alternatives with clear trade-offs (latency, cost, quality, security) and drive delivery by mentoring junior engineers and coordinating closely with frontend and infrastructure teams.

    • Multi-agent systems
    • LangChain
    • LangGraph
    • HF Transformers
    • MLFflow
    • Azure
    • HF Accelerate
    • vLLM
    • NVIDIA Triton
    • Semantic search
  2. Dec. 2023 – Apr. 2024

    Machine Learning Engineer · TEQ Capital Full-time

    At TEQ Capital, I built key components of an LLM-assisted analytics stack for financial teams, spanning ingestion, retrieval, and multimodal document understanding. I implemented Dagster-based Python pipelines to ingest and normalize documents, audio, web content, and external provider data via REST/GraphQL/WebSockets. I then designed a question graph and integrated GPT-4 Turbo for financial question answering, driving ~80% workload reduction for analysts. To improve reliability, I added a retrieval-augmented generation (RAG) layer with re-ranking to ground answers in the latest financial data, and built a document understanding pipeline that interprets not only text but also charts and tables for higher-quality responses.

    • Python
    • Dagster
    • GraphQL
    • WebSocket
    • LLMs
    • RAG
    • Re-ranking
    • AWS
  3. Nov. 2021 – Jun. 2023

    ML Research Assistant - ASR & NLP · Fraunhofer IAIS Part-time

    At Fraunhofer IAIS, I worked on research at the intersection of ASR and NLP, with a focus on multilingual and speech-based information extraction. I evaluated language representation models (BERT, XLM-R, XLM-V) for Named Entity Recognition (NER), emphasizing cross-lingual transfer performance. To enable rapid iteration on speech systems, I built and maintained a data pipeline for ASR training and experimentation. Building on this foundation, I developed multiple approaches to Spoken Named Entity Recognition, both cascading (ASR → NER) and end-to-end, leveraging Wav2Vec2 XLS-R and Whisper, and investigated transfer-learning strategies across low- and high-resource languages to improve multilingual generalization.

    • BERT
    • XLM-R
    • XLM-V
    • NER
    • Cross-lingual Transfer Learning
    • ASR
    • Speech recognition
    • Wav2Vec2 XLS-R
    • Whisper
    • DeepSpeed
    • 4-bit quantization
    • QLoRA
  4. Nov. 2020 – Mar. 2023

    Software Engineer · Taliox Part-time

    At Taliox, I delivered Python backend products from design through deployment, building web applications and REST APIs with Django, FastAPI, and Flask. I automated recurring workflows with Python and Shell scripts for task automation, data collection, and data pre/post-processing, and I built browser automation bots using Selenium and BeautifulSoup (BS4) to support operational needs. To ensure repeatable releases and developer-friendly setups, I containerized applications with Docker to create consistent cross-platform development environments and streamline deployments.

    • Python
    • Django
    • FastAPI
    • Flask
    • REST API
    • gRPC
    • Websocket
    • Selenium
    • BeautifulSoup (BS4)
    • Docker
  5. Dec 2019 – Nov. 2020

    ML Research Assistant - ASR & NLP · RWTH Human Language Technology Department - ASR Group Part-time

    At RWTH’s Human Language Technology Department (ASR Group), I investigated how different feature extraction strategies influence ASR acoustic model performance. I also replicated key experiments from the wav2vec line of research to validate published findings, strengthening my foundation in speech representation learning and rigorous experimental evaluation.

    • ASR
    • Acoustic modeling
    • Feature extraction
    • Python
    • C++

Projects

  • Spoken NER (Cross-Lingual Transfer Learning)

    Spoken Named Entity Recognition (NER) experiments for the paper “Leveraging Cross-Lingual Transfer Learning in Spoken Named Entity Recognition Systems”, using Wav2Vec2-XLS-R models and transfer learning across English, German, and Dutch on Common Voice-derived data.

    • Python
    • PyTorch
    • HF Transformers
    • Wav2Vec2-XLS-R
    • HF Accelerate