Solutions

We provide machine learning solutions for several multimodal application domains:

ML for Video analysis

We provide solutions that combine machine learning applied on all aspects of video (sound, image and accompanying text), to achieve several video analysis applications:
  • Video summarization
  • Video classification and retrieval
  • Multimodal content-based video / movie recommendation and profiling
  • Human recognition and human activity recognition

Music information retrieval

We have years' experience in build models for analyzing musical content for various subdomains such as:
  • Musical genre classification
  • Recognition of music mood and emotion
  • Content-based music retrieval and recommendation

Speech analytics

We combine audio and text information from speech signals, to recognize what people say (and how they say it) when interacting between each other for applications such as:
  • Health and Wellbeing: cognitive decline detection, mental health monitoring, depression estimation
  • Mood and emotion classification
  • Speaking style and public speaking quality analysis

Audio (non-speech) Recognition

  • Mouse vocalizations analytics
  • Audio event detection
  • Urban soundscape analysis
  • Audio context classification

Environment & Bioacoustics

  • Urban soundscape quality assessment and monitoring
  • Wildlife and bioacoustic signal analysis
  • Environmental sound event detection

Defence & Security

  • Acoustic threat detection using few-shot and self-supervised learning
  • Adversarial robustness evaluation of audio AI systems
  • Frugal AI for resource-constrained defence applications

Image retrieval and classification

  • Semantic image classification and segmentation
  • Photography aesthetics recognition

Other multimodal data analytics