System Architecture Overview
System Design Diagram:
Below is a diagram of the system architecture for CueMeet and Meeting Bots Infrastructure and inner workings.
The CueMeet.ai system is designed to automate and manage meeting bots across various platforms using a robust, cloud-native architecture on Amazon Web Services (AWS). This system orchestrates bot execution, data processing, and error handling to provide a seamless meeting automation experience.
System Overview:
1. Control Backend (NestJS with BullMQ):
- Purpose: Orchestrates the lifecycle of meeting bot jobs and directly manages ECS tasks.
- Technology: Built using NestJS with BullMQ for task queuing and scheduling.
- Functionality:
- Exposes APIs for managing meeting bots (no dedicated frontend).
- Schedules and manages bot jobs.
- Uses the AWS SDK to directly interact with AWS ECS.
- Stores meeting bot configuration and status in a PostgreSQL database.
- Utilizes BullMQ for task queuing and processing.
- Includes cron jobs for:
syncTaskStatus
: Periodically checks the status of running ECS tasks and updates the database accordingly.initiateScheduledBot
: Triggers scheduled meeting bot tasks.
- Manages the initiation and re-initiation of ECS tasks.
- ECS Client Service:
- Manages ECS task lifecycle (run, stop, list, describe).
- Handles error scenarios and retries.
- Stores and retrieves ECS task information.
2. AWS Fargate (Container Execution):
- Purpose: Provides a serverless compute platform for running Docker containers.
- Functionality:
- Dynamically spins up containers based on requests from the Control Backend.
- Executes the MeetingBots, which are containerized Python applications.
- Scales automatically based on demand.
- Amazon ECR (Elastic Container Registry): Stores the Docker images required for the MeetingBots, ensuring efficient deployment and version control.
3. MeetingBots (Python with Selenium and FFmpeg):
- Purpose: Automates meeting participation and data capture.
- Technology: Built using Python, Selenium (for browser automation), and FFmpeg (for audio recording).
- Functionality:
- Programmatically joins meetings on various platforms (Google Meet, Zoom, Microsoft Teams).
- Captures high-quality audio recordings.
- Extracts live captions and transcripts from the respective meeting platform.
- Uploads the audio and transcript data in a combined
.tar
file and also standalone audio files (.opus) to Amazon S3 using presigned URLs. - The output files are stored in the
/out
directory inside the container.
4. Amazon S3 (Object Storage):
- Purpose: Stores the meeting data (audio and transcript files).
- Functionality:
- Receives uploaded data from the MeetingBots.
- Provides durable and scalable storage.
- Triggers SNS notifications on new file uploads.
5. Worker Backend (Django with Celery):
- Purpose: Processes the uploaded meeting data, generating enhanced transcripts and performing post-processing tasks.
- Technology: Built using Django and Celery, a distributed task queue.
- Functionality:
- Receives notifications from S3 via SNS.
- Downloads the
.tar
file from S3. - Extracts audio and transcript files.
- Uses AssemblyAI for high-quality audio transcription.
- Uses NLTK and rapidfuzz for speaker reconciliation, combining AssemblyAI's audio transcription with the original meeting captions.
- Stores the processed data in a PostgreSQL database.
- Exposes a gRPC interface for communication with the Control Backend.
- Provides a Celery task to retry failed transcriptions.
- Post-Processing (Speaker Reconciliation):
- Uses AssemblyAI for accurate audio transcription.
- Aligns speaker names from the original meeting captions with the AssemblyAI transcript.
- Employs NLTK for sentence tokenization and rapidfuzz for fuzzy string matching to reconcile speakers.
- Handles timestamp alignment and text similarity comparisons.
- Django Celery Tasks:
_transcript_generator_worker
:- Downloads the
.tar
file from S3. - Extracts audio and transcript files.
- Transcribes the audio using AssemblyAI.
- Reconciles speaker labels using the meeting metadata and NLTK/rapidfuzz.
- Stores the transcription data in the PostgreSQL database.
- Handles error logging and retries.
- Downloads the
_transcript_retry_cronjob
:- Periodically checks for failed transcription tasks.
- Retries failed tasks, up to a maximum number of retries.
- Sends failure notifications if retries are exhausted.
- PostgreSQL Database: Stores processed meeting data, including transcripts and metadata.
- Redis Integration: Celery uses Redis as a message broker and result backend.
- gRPC Interface: Enables communication between the Worker Backend and the Control Backend.
- SNS/SQS Integration: S3 triggers SNS notifications upon file upload, which are then queued in SQS for processing by the Worker Backend. This system includes a Dead Letter Queue (DLQ) for handling failed messages.
6. Error Monitoring and Observability:
- highlight.io:
- Monitors the system for errors and exceptions for both meeting bots and each of the control panel services.
- Provides real-time alerts via Slack.
- Captures detailed logs and error reports.
- CloudWatch:
- Monitors ECS logs and application-level metrics.
- Aids in debugging and performance optimization.
Workflow Summary:
- A meeting bot task is scheduled or triggered via the Control Backend API.
- The Control Backend uses the ECS Client Service to launch an ECS task in Fargate.
- The MeetingBot joins the meeting, records audio, and extracts captions.
- The MeetingBot uploads the data to S3.
- S3 triggers an SNS notification.
- The Worker Backend, via Celery, processes the uploaded data, transcribes it using AssemblyAI, and reconciles speakers using NLTK and rapidfuzz.
- The processed data is stored in the PostgreSQL database.
- highlight.io and CloudWatch monitor the system for errors and performance issues.
Key Technologies
- Containerization: Docker, AWS Fargate, Amazon ECR
- Orchestration: NestJS, BullMQ, AWS SDK
- Automation: Python, Selenium, FFmpeg
- Data Processing: Django, Celery, AssemblyAI, NLTK
- Storage: Amazon S3, PostgreSQL, Redis
- Messaging: Amazon SNS, Amazon SQS
- Monitoring: highlight.io, AWS CloudWatch