Skip to content

lorenzomorelli-webdev/RecordingToTasks

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

7 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Recording to Tasks

A Python CLI tool that transforms audio/video meeting recordings into structured transcriptions and to-do lists using OpenAI APIs (Whisper for transcription and GPT for analysis).

πŸš€ Features

  • Automatic transcription of audio and video files using Whisper/GPT-4o
  • Intelligent extraction of tasks, decisions, and action items
  • Large file handling with automatic chunking
  • Multi-format support (MP4, MOV, WAV, MP3, M4A, etc.)
  • Parallel processing to speed up transcription
  • Automatic retry on API errors with exponential backoff
  • Precise timestamps for each section
  • Structured output in Markdown format
  • πŸ’° Cost optimization with transcription-only mode (-t flag)
  • Enhanced error handling with response validation and diagnostics

πŸ› οΈ Setup and Installation

Prerequisites

  1. Python 3.8+ installed on your system
  2. ffmpeg installed:
    # macOS
    brew install ffmpeg
    
    # Ubuntu/Debian
    sudo apt update && sudo apt install ffmpeg
    
    # Windows
    # Download from https://ffmpeg.org/download.html
  3. OpenAI account with API key

Installation

Option 1: Automatic Setup (Recommended)

  1. Clone the repository:

    git clone <repository-url>
    cd RecordingToTasks
  2. Run the setup script:

    ./setup.sh

    The script will automatically install:

    • ffmpeg (if not present)
    • Python virtual environment
    • All required dependencies
    • Create .env file from template
  3. Configure your API key:

    Edit the .env file and insert your OpenAI API key:

    OPENAI_API_KEY=sk-your-actual-api-key-here
    

Option 2: Manual Setup

  1. Clone the repository:

    git clone <repository-url>
    cd RecordingToTasks
  2. Create and activate virtual environment:

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
  3. Install dependencies:

    pip install -r requirements.txt
  4. Configure environment variables:

    cp env.example .env

    Edit the .env file and insert your OpenAI API key:

    OPENAI_API_KEY=sk-your-actual-api-key-here
    

Verify Installation

python main.py --help

Or run the setup test:

python test_setup.py

πŸ“– Usage

Basic Command

# Process audio/video file (transcription + analysis)
python main.py /path/to/your/recording.mp4

πŸ’° Cost-Saving Mode: Use Existing Transcription

If you already have a transcription file, you can skip the expensive transcription step and only run the task analysis:

# Use existing transcription (SAVES MONEY!)
python main.py -t output/recording_transcription.txt
python main.py --transcription transcription.txt

Cost Comparison:

  • Full processing (37 min video): ~$0.23 (transcription) + ~$0.01 (analysis) = $0.24
  • Transcription-only mode: ~$0.01 (analysis only) = 96% cheaper!

This is perfect for:

  • Testing prompt changes
  • Debugging analysis issues
  • Re-running analysis with different settings
  • Processing the same recording multiple times

Examples

# Transcribe a video file
python main.py meeting_2024_01_15.mp4

# Transcribe an audio file
python main.py call_with_client.wav

# Process multiple files in sequence
python main.py file1.mp4 file2.wav file3.m4a

# Use existing transcription (cost-saving mode)
python main.py -t output/meeting_transcription.txt

# Show help
python main.py --help

Supported Formats

Audio: .wav, .mp3, .m4a, .flac, .aac, .ogg, .wma Video: .mp4, .mov, .avi, .mkv, .wmv, .flv, .webm, .m4v Transcriptions: .txt (with -t/--transcription flag)

Output

The tool generates files in the output/ folder:

  1. filename_transcription.txt - Complete transcription with timestamps (not generated in -t mode)

  2. filename_tasks.md - Structured AI-powered analysis:

    • Executive Summary - Concise 2-3 sentence summary of concrete decisions
    • Action Items / To-Do List - Tasks organized by:
      • Category (development, testing, documentation, infrastructure, meeting, other)
      • Priority (high πŸ”΄, medium 🟑, low 🟒)
      • Detailed description, responsible party, deadline, and context
      • Markdown checkbox format for tracking: - [ ] Task
    • Decisions Made - List of concrete decisions
    • Next Steps - Identified future actions
    • Additional Notes - Technical references, links, relevant information

    Intelligent Filtering: The AI distinguishes between generic discussions and actionable tasks, automatically ignoring casual conversation and including only concrete commitments.

βš™οΈ Configuration

Environment Variables (.env file)

# API Configuration
OPENAI_API_KEY=your_api_key_here
OPENAI_ORG_ID=your_org_id_here  # Optional

# ============================================
# PRESET CONFIGURATIONS (2025 Optimized)
# ============================================

# PRESET 1: PREMIUM (Maximum Quality) ⭐ DEFAULT
# Cost: $0.367/hour | Quality: 10/10
TRANSCRIPTION_MODEL=gpt-4o-transcribe
ANALYSIS_MODEL=gpt-5-mini

# PRESET 2: RECOMMENDED (Best Value)
# Cost: $0.181/hour | Quality: 9/10 (51% cheaper)
# TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
# ANALYSIS_MODEL=gpt-5-nano

# PRESET 3: BALANCED
# Cost: $0.187/hour | Quality: 9.5/10
# TRANSCRIPTION_MODEL=gpt-4o-mini-transcribe
# ANALYSIS_MODEL=gpt-5-mini

# ============================================

# Language for transcription (ISO-639-1 format, improves accuracy)
# it=Italian, en=English, es=Spanish, fr=French, de=German
LANGUAGE=en

# Speaker Diarization (optional, requires pyannote-audio)
ENABLE_DIARIZATION=false                     # Identify speakers (true/false)
NUM_SPEAKERS=2                               # Number of speakers (if diarization=true)

# Processing Configuration
MAX_RETRIES=3                                # Retry attempts on errors
MAX_PARALLEL_TASKS=3                        # Parallel transcription workers
SIZE_LIMIT_MB=20                            # File size limit for chunking (MB)

🎯 Recommended Presets (2025)

Preset Transcription Analysis Cost/hour Quality When to Use
Premium ⭐ gpt-4o-transcribe gpt-5-mini $0.367 10/10 Default - Critical meetings, maximum accuracy
Recommended gpt-4o-mini-transcribe gpt-5-nano $0.181 9/10 Limited budget, frequent use (51% savings)
Balanced gpt-4o-mini-transcribe gpt-5-mini $0.187 9.5/10 Complex task analysis, budget-conscious

Model Details and Quality

Transcription Models:

Model Price WER Quality Notes
gpt-4o-transcribe ⭐ $0.006/min 2.46% 10/10 Near-human accuracy, excellent for difficult audio
gpt-4o-mini-transcribe $0.003/min 8.9% 9/10 Excellent quality/price ratio, 50% cheaper
whisper-1 (legacy) $0.006/min 7.88% 7/10 Deprecated, no advantage vs new models

WER = Word Error Rate (lower is better)

Analysis Models:

Model Input/Output (per 1M tokens) MMLU Quality Notes
gpt-5-mini ⭐ $0.25 / $2.00 ~88% 10/10 Advanced reasoning, excellent for complex tasks
gpt-5-nano $0.05 / $0.40 ~82% 9/10 Perfect for structured task extraction
gpt-4o-mini $0.15 / $0.60 82% 8/10 Economical alternative, proven and reliable

MMLU = Massive Multitask Language Understanding (higher is better)

πŸ’° Cost Estimates (2025)

Costs depend on recording length.

Default Configuration - PREMIUM (Maximum Quality)

  • Transcription: gpt-4o-transcribe ($0.006/min)
  • Analysis: gpt-5-mini ($0.25/$2.00 per 1M tokens)
  • Total 1 hour: $0.367 (maximum quality, WER 2.46%)

Detailed Breakdown (1-hour meeting)

Component Model Calculation Cost
Transcription gpt-4o-transcribe 60 min Γ— $0.006/min $0.360
Analysis gpt-5-mini 12,500 input Γ— $0.25/1M
1,750 output Γ— $2.00/1M
$0.003
$0.004
TOTAL $0.367

Preset Comparison (1-hour meeting)

Preset Transcription Analysis Total Quality Savings
Premium ⭐ (default) gpt-4o-transcribe gpt-5-mini $0.367 10/10 -
Recommended gpt-4o-mini-transcribe gpt-5-nano $0.181 9/10 51%
Balanced gpt-4o-mini-transcribe gpt-5-mini $0.187 9.5/10 49%
Legacy (old) whisper-1 gpt-4o-mini $0.363 7/10 1%

Cost Notes:

  • Premium preset offers maximum quality (WER 2.46% vs 8.9%)
  • Recommended preset saves 51% while maintaining excellent quality (9/10)
  • Costs calculated for 1-hour meetings (~12,500 input tokens, 1,750 output)
  • For frequent use, consider Recommended preset to optimize costs
  • Use -t flag with existing transcriptions to save 96% on re-processing

πŸ”§ Large File Handling

The tool automatically handles large files:

  • Automatic chunking: Files > 20MB are split into chunks
  • Parallel processing: Multiple chunks processed simultaneously
  • Timeline reconstruction: Timestamps preserved in final output

πŸ› οΈ Development

Project Structure

RecordingToTasks/
β”œβ”€β”€ main.py              # Main script
β”œβ”€β”€ requirements.txt     # Python dependencies
β”œβ”€β”€ setup.sh            # Automatic installation script
β”œβ”€β”€ test_setup.py       # Setup verification test
β”œβ”€β”€ .env                # Configuration (not committed)
β”œβ”€β”€ env.example         # Configuration template
β”œβ”€β”€ README.md           # Documentation
β”œβ”€β”€ CLAUDE.md           # Technical documentation
β”œβ”€β”€ .gitignore          # Files to ignore
β”œβ”€β”€ venv/               # Virtual environment
β”œβ”€β”€ temp/               # Temporary files
β”œβ”€β”€ output/             # Output files
└── tests/              # Test scripts and samples

Dependencies

  • openai: OpenAI API client
  • python-dotenv: Environment variable management
  • ffmpeg: Audio/video processing (external dependency)
  • pyannote.audio (optional): Speaker diarization to identify who speaks
    • Requires Hugging Face account and token
    • Enable with ENABLE_DIARIZATION=true in .env file

Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/feature-name
  3. Commit your changes: git commit -am 'Add new feature'
  4. Push the branch: git push origin feature/feature-name
  5. Open a Pull Request

πŸ› Troubleshooting

Common Errors

"ffmpeg not found"

# Verify installation
ffmpeg -version

# If not installed, follow prerequisites

"OpenAI API key not found"

# Verify .env file
cat .env

# Make sure the key is correct

"File too large"

  • The tool automatically handles large files
  • Increase SIZE_LIMIT_MB in .env if needed

Transcription errors

  • The tool automatically retries with exponential backoff
  • Check internet connection
  • Verify OpenAI API rate limits

JSON parsing errors

  • Now includes enhanced diagnostics with finish_reason validation
  • Increased token limit from 3000 to 8000 for long transcriptions
  • Check logs for detailed error information

Debug Mode

For more detailed debugging, temporarily modify main.py:

import logging
logging.basicConfig(level=logging.DEBUG)

πŸ“„ License

MIT License - see LICENSE file for details

🀝 Support

For bug reports or feature requests, open an issue on GitHub.


Note: This tool is optimized for Italian and English meetings. For other languages, you may need to modify the analysis prompts in main.py.

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors