OpenAI 실시간 API¶

공식 문서

📝 개요¶

소개¶

OpenAI Realtime API는 두 가지 연결 방식을 제공합니다:

WebRTC - 브라우저/모바일 클라이언트의 실시간 오디오/비디오 상호작용용
WebSocket - 서버 간 애플리케이션 연동용

사용 사례¶

실시간 음성 대화
오디오/비디오 회의
실시간 번역
음성 전사
실시간 코드 생성
서버사이드 실시간 연동

주요 기능¶

양방향 오디오 스트리밍
텍스트/오디오 혼합 대화
함수 호출 지원
자동 음성 활동 감지(VAD)
오디오 전사 기능
WebSocket 서버사이드 통합

🔐 인증 및 보안¶

인증 방법¶

표준 API 키 (서버사이드 전용)
임시 토큰 (클라이언트용)

임시 토큰¶

유효 시간: 1분
사용 제한: 단일 연결
발급 방식: 서버사이드 API에서 생성

POST https://ssanai-workspace.atto-lab.cc/v1/realtime/sessions
Content-Type: application/json
Authorization: Bearer $API_KEY

{
  "model": "gpt-4o-realtime-preview-2024-12-17",
  "voice": "verse"
}

보안 권장 사항¶

표준 API 키를 클라이언트에 노출하지 마세요
통신에는 HTTPS/WSS를 사용하세요
적절한 접근 제어를 적용하세요
비정상 활동을 모니터링하세요

🔌 연결 설정¶

WebRTC 연결¶

URL: https://ssanai-workspace.atto-lab.cc/v1/realtime
쿼리 파라미터: model
헤더:
Authorization: Bearer EPHEMERAL_KEY
Content-Type: application/sdp

WebSocket 연결¶

URL: wss://ssanai-workspace.atto-lab.cc/v1/realtime
쿼리 파라미터: model
헤더:
Authorization: Bearer YOUR_API_KEY
OpenAI-Beta: realtime=v1

연결 흐름¶

sequenceDiagram
    participant Client
    participant Server
    participant OpenAI

    alt WebRTC Connection
        Client->>Server: Request ephemeral token
        Server->>OpenAI: Create session
        OpenAI-->>Server: Return ephemeral token
        Server-->>Client: Return ephemeral token

        Client->>OpenAI: Create WebRTC offer
        OpenAI-->>Client: Return answer

        Note over Client,OpenAI: Establish WebRTC connection

        Client->>OpenAI: Create data channel
        OpenAI-->>Client: Confirm data channel
    else WebSocket Connection
        Server->>OpenAI: Establish WebSocket connection
        OpenAI-->>Server: Confirm connection

        Note over Server,OpenAI: Begin real-time conversation
    end

데이터 채널¶

이름: oai-events
목적: 이벤트 전송
형식: JSON

오디오 스트림¶

입력: addTrack()
출력: ontrack 이벤트

💬 대화 상호작용¶

대화 모드¶

텍스트 전용 대화
음성 대화
혼합 대화

세션 관리¶

세션 생성
세션 업데이트
세션 종료
세션 설정

이벤트 유형¶

텍스트 이벤트
오디오 이벤트
함수 호출
상태 업데이트
오류 이벤트

⚙️ 설정 옵션¶

오디오 설정¶

입력 형식
pcm16
g711_ulaw
g711_alaw
출력 형식
pcm16
g711_ulaw
g711_alaw
음성 유형
alloy
echo
shimmer

모델 설정¶

온도(temperature)
최대 출력 길이
시스템 프롬프트
도구 설정

VAD 설정¶

임계값
무음 지속 시간
프리픽스 패딩

💡 요청 예시¶

WebSocket 연결 ✅¶

Node.js (ws 모듈)¶

import WebSocket from "ws";

const url = "wss://ssanai-workspace.atto-lab.cc/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17";
const ws = new WebSocket(url, {
  headers: {
    "Authorization": "Bearer " + process.env.API_KEY,
    "OpenAI-Beta": "realtime=v1",
  },
});

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(JSON.parse(message.toString()));
});

Python (websocket-client)¶

# Requires websocket-client library:
# pip install websocket-client

import os
import json
import websocket

API_KEY = os.environ.get("API_KEY")

url = "wss://ssanai-workspace.atto-lab.cc/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
headers = [
    "Authorization: Bearer " + API_KEY,
    "OpenAI-Beta: realtime=v1"
]

def on_open(ws):
    print("Connected to server.");

def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

ws = websocket.WebSocketApp(
    url,
    header=headers,
    on_open=on_open,
    on_message=on_message,
)

ws.run_forever()

브라우저 (표준 WebSocket)¶

/*
참고: 브라우저 및 기타 클라이언트 환경에서는 WebRTC 사용을 권장합니다.
다만 Deno, Cloudflare Workers 같은 브라우저 유사 환경에서는
표준 WebSocket 인터페이스도 사용할 수 있습니다.
*/

const ws = new WebSocket(
  "wss://ssanai-workspace.atto-lab.cc/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17",
  [
    "realtime",
    // Authentication
    "openai-insecure-api-key." + API_KEY, 
    // Optional
    "openai-organization." + OPENAI_ORG_ID,
    "openai-project." + OPENAI_PROJECT_ID,
    // Beta protocol, required
    "openai-beta.realtime-v1"
  ]
);

ws.on("open", function open() {
  console.log("Connected to server.");
});

ws.on("message", function incoming(message) {
  console.log(message.data);
});

메시지 송수신 예시¶

Node.js/브라우저¶

// Receive server events
ws.on("message", function incoming(message) {
  // Need to parse message data from JSON
  const serverEvent = JSON.parse(message.data)
  console.log(serverEvent);
});

// Send events, create JSON data structure conforming to client event format
const event = {
  type: "response.create",
  response: {
    modalities: ["audio", "text"],
    instructions: "Give me a haiku about code.",
  }
};
ws.send(JSON.stringify(event));

Python¶

# Send client events, serialize dictionary to JSON
def on_open(ws):
    print("Connected to server.");

    event = {
        "type": "response.create",
        "response": {
            "modalities": ["text"],
            "instructions": "Please assist the user."
        }
    }
    ws.send(json.dumps(event))

# Receive messages need to parse message payload from JSON
def on_message(ws, message):
    data = json.loads(message)
    print("Received event:", json.dumps(data, indent=2))

WebSocket Python 오디오 예시¶

예제 문서¶

이 문서는 OpenAI Realtime WebSocket 음성 대화를 위한 Python 예제로, 실시간 음성 입력과 출력을 지원합니다.

기능¶

실시간 음성 녹음: 음성 입력을 자동 감지해 서버로 전송
실시간 오디오 재생: AI의 음성 응답 재생
텍스트 표시: AI의 텍스트 응답을 동시에 표시
자동 음성 감지: 서버사이드 VAD(Voice Activity Detection) 사용
양방향 통신: 연속 대화 지원

요구 사항¶

Python 3.7 이상
마이크 및 스피커
안정적인 네트워크 연결

의존성 설치¶

pip install -r requirements.txt

시스템 의존성: Linux 환경에서는 추가 오디오 라이브러리 설치가 필요할 수 있습니다.

# Ubuntu/Debian
sudo apt-get install portaudio19-dev python3-pyaudio

# CentOS/RHEL
sudo yum install portaudio-devel

설정¶

openai_realtime_client.py 파일에서 다음 설정이 올바른지 확인하세요:

WEBSOCKET_URL = "wss://ssanai-workspace.atto-lab.cc/v1/realtime"
API_KEY = "your_api_key"
MODEL = "gpt-4o-realtime-preview-2024-12-17"

사용 방법¶

프로그램 실행:
```
python openai_realtime_client.py
```
대화 시작:
실행 후 자동으로 녹음 시작
마이크에 대고 말하기
AI가 실시간으로 음성 응답
프로그램 종료:
Ctrl+C를 눌러 종료

기술 상세¶

오디오 설정: - 샘플레이트: 24kHz (OpenAI Realtime API 요구 사항) - 포맷: PCM16 - 채널: 모노 - 인코딩: Base64

WebSocket 메시지 타입: - session.update: 세션 설정 - input_audio_buffer.append: 오디오 데이터 전송 - input_audio_buffer.commit: 오디오 버퍼 확정 - response.audio.delta: 오디오 응답 수신 - response.text.delta: 텍스트 응답 수신

음성 활동 감지: 서버사이드 VAD 설정 사용: - 임계값: 0.5 - 프리픽스 패딩: 300ms - 무음 지속 시간: 500ms

문제 해결¶

자주 발생하는 문제:

오디오 장치 문제:

# 오디오 장치 확인
python -c "import pyaudio; p = pyaudio.PyAudio(); print([p.get_device_info_by_index(i) for i in range(p.get_device_count())])"

권한 문제:
프로그램에 마이크 접근 권한이 있는지 확인
Linux: ALSA/PulseAudio 설정 확인
네트워크 연결 문제:
WebSocket URL이 올바른지 확인
API 키 유효성 확인
방화벽 설정 확인

디버그 모드:

상세 로그 활성화:

logging.basicConfig(level=logging.DEBUG)

코드 구조¶

├── openai_realtime_client.py  # 메인 프로그램 파일
├── requirements.txt           # Python 의존성
└── README.md                  # 문서

주요 클래스 및 메서드:

OpenAIRealtimeClient: 메인 클라이언트 클래스
connect(): WebSocket 연결
start_audio_streams(): 오디오 스트림 시작
start_recording(): 녹음 시작
handle_response(): 응답 처리
start_conversation(): 대화 시작

참고 사항¶

오디오 품질: 더 좋은 결과를 위해 조용한 환경에서 사용하세요.
네트워크 지연: 실시간 대화는 네트워크 지연에 민감합니다.
리소스 사용량: 장시간 세션은 CPU/메모리 사용량이 커질 수 있습니다.
API 한도: OpenAI API 사용량 제한 및 비용을 확인하세요.

라이선스¶

이 프로젝트는 학습 및 테스트 용도로만 제공됩니다. OpenAI 이용 약관을 준수하세요.

예시 코드¶

#!/usr/bin/env python3
"""
OpenAI Realtime WebSocket Audio Example
Supports real-time voice conversation, including audio recording, sending, and playback
"""

import asyncio
import json
import base64
import websockets
import pyaudio
import wave
import threading
import time
from typing import Optional
import logging

# Configure logging
logging.basicConfig(
    level=logging.DEBUG,
    format='%(asctime)s - %(name)s - %(levelname)s - %(message)s',
    handlers=[
        logging.StreamHandler(),
        logging.FileHandler('websocket_debug.log', encoding='utf-8')
    ]
)
logger = logging.getLogger(__name__)

class OpenAIRealtimeClient:
    def __init__(self, 
                 websocket_url: str,
                 api_key: str,
                 model: str = "gpt-4o-realtime-preview-2024-12-17"):
        self.websocket_url = websocket_url
        self.api_key = api_key
        self.model = model
        self.websocket = None
        self.is_recording = False
        self.is_connected = False

        # Audio configuration
        self.audio_format = pyaudio.paInt16
        self.channels = 1
        self.rate = 24000  # OpenAI 실시간 API required sample rate
        self.chunk = 1024
        self.audio = pyaudio.PyAudio()

        # Audio streams
        self.input_stream = None
        self.output_stream = None

    async def connect(self):
        """Connect to WebSocket server"""
        headers = {
            "Authorization": f"Bearer {self.api_key}",
            "OpenAI-Beta": "realtime=v1"
        }

        logger.info("=" * 80)
        logger.info("🚀 Starting WebSocket connection")
        logger.info("=" * 80)
        logger.info(f"Connection URL: {self.websocket_url}")
        logger.info(f"API Key: {self.api_key[:10]}...")
        logger.info(f"Headers: {json.dumps(headers, ensure_ascii=False, indent=2)}")

        try:
            self.websocket = await websockets.connect(
                self.websocket_url,
                additional_headers=headers
            )
            self.is_connected = True
            logger.info("✅ WebSocket connection successful")

            # Send session configuration
            await self.send_session_config()

        except Exception as e:
            logger.error(f"❌ WebSocket connection failed: {e}")
            logger.error(f"Error type: {type(e).__name__}")
            raise

    async def send_session_config(self):
        """Send session configuration"""
        config = {
            "type": "session.update",
            "session": {
                "modalities": ["text", "audio"],
                "instructions": "You are a helpful AI assistant that can engage in real-time voice conversations.",
                "voice": "alloy",
                "input_audio_format": "pcm16",
                "output_audio_format": "pcm16",
                "input_audio_transcription": {
                    "model": "whisper-1"
                },
                "turn_detection": {
                    "type": "server_vad",
                    "threshold": 0.5,
                    "prefix_padding_ms": 300,
                    "silence_duration_ms": 500
                },
                "tools": [],
                "tool_choice": "auto",
                "temperature": 0.8,
                "max_response_output_tokens": 4096
            }
        }

        config_json = json.dumps(config, ensure_ascii=False, indent=2)
        logger.info("=" * 60)
        logger.info("📤 Sending session configuration:")
        logger.info(f"Message type: {config['type']}")
        logger.info(f"Configuration content:\n{config_json}")
        logger.info("=" * 60)

        await self.websocket.send(json.dumps(config))
        logger.info("✅ Session configuration sent")

    def start_audio_streams(self):
        """Start audio input and output streams"""
        try:
            # Input stream (microphone)
            self.input_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                input=True,
                frames_per_buffer=self.chunk
            )

            # Output stream (speakers)
            self.output_stream = self.audio.open(
                format=self.audio_format,
                channels=self.channels,
                rate=self.rate,
                output=True,
                frames_per_buffer=self.chunk
            )

            logger.info("Audio streams started")

        except Exception as e:
            logger.error(f"Failed to start audio streams: {e}")
            raise

    def stop_audio_streams(self):
        """Stop audio streams"""
        if self.input_stream:
            self.input_stream.stop_stream()
            self.input_stream.close()
            self.input_stream = None

        if self.output_stream:
            self.output_stream.stop_stream()
            self.output_stream.close()
            self.output_stream = None

        logger.info("Audio streams stopped")

    async def start_recording(self):
        """Start recording and send audio data"""
        self.is_recording = True
        logger.info("Starting recording...")

        try:
            while self.is_recording and self.is_connected:
                # Read audio data
                audio_data = self.input_stream.read(self.chunk, exception_on_overflow=False)

                # Encode audio data as base64
                audio_base64 = base64.b64encode(audio_data).decode('utf-8')

                # Send audio data
                message = {
                    "type": "input_audio_buffer.append",
                    "audio": audio_base64
                }

                # Log audio data sending (every 10 times to avoid excessive logging)
                if hasattr(self, '_audio_count'):
                    self._audio_count += 1
                else:
                    self._audio_count = 1

                if self._audio_count % 10 == 0:  # Log every 10 times
                    logger.debug(f"🎤 Sending audio data #{self._audio_count}: length={len(audio_base64)} characters")

                await self.websocket.send(json.dumps(message))

                # Brief delay to avoid excessive sending
                await asyncio.sleep(0.01)

        except Exception as e:
            logger.error(f"Error during recording: {e}")
        finally:
            logger.info("Recording stopped")

    async def stop_recording(self):
        """Stop recording"""
        self.is_recording = False

        # Send recording end signal
        if self.websocket and self.is_connected:
            message = {
                "type": "input_audio_buffer.commit"
            }

            logger.info("=" * 60)
            logger.info("📤 Sending recording end signal:")
            logger.info(f"Message type: {message['type']}")
            logger.info("=" * 60)

            await self.websocket.send(json.dumps(message))
            logger.info("✅ Recording end signal sent")

    async def handle_response(self):
        """Handle WebSocket responses"""
        try:
            async for message in self.websocket:
                data = json.loads(message)
                message_type = data.get("type", "unknown")

                # Log all received messages in detail
                logger.info("=" * 60)
                logger.info("📥 Received WebSocket message:")
                logger.info(f"Message type: {message_type}")

                # Handle different message types
                if message_type == "response.audio.delta":
                    # Handle audio response
                    audio_data = base64.b64decode(data.get("delta", ""))
                    logger.info(f"🎵 Audio data: length={len(audio_data)} bytes")
                    if audio_data and self.output_stream:
                        self.output_stream.write(audio_data)
                        logger.info("✅ Audio data played")

                elif message_type == "response.text.delta":
                    # Handle text response
                    text = data.get("delta", "")
                    logger.info(f"💬 Text delta: '{text}'")
                    if text:
                        print(f"AI: {text}", end="", flush=True)

                elif message_type == "response.text.done":
                    # Text response complete
                    logger.info("✅ Text response complete")
                    print("\n")

                elif message_type == "response.audio.done":
                    # Audio response complete
                    logger.info("✅ Audio response complete")

                elif message_type == "error":
                    # Handle errors
                    error_info = data.get('error', {})
                    logger.error("❌ Server error:")
                    logger.error(f"Error details: {json.dumps(error_info, ensure_ascii=False, indent=2)}")

                elif message_type == "session.created":
                    # Session created successfully
                    logger.info("✅ Session created")

                elif message_type == "session.updated":
                    # Session updated successfully
                    logger.info("✅ Session updated")

                elif message_type == "conversation.item.created":
                    # Conversation item created
                    logger.info("📝 Conversation item created")

                elif message_type == "conversation.item.input_audio_buffer.speech_started":
                    # Speech started
                    logger.info("🎤 Speech start detected")

                elif message_type == "conversation.item.input_audio_buffer.speech_stopped":
                    # Speech stopped
                    logger.info("🔇 Speech stop detected")

                elif message_type == "conversation.item.input_audio_buffer.committed":
                    # Audio buffer committed
                    logger.info("📤 Audio buffer committed")

                else:
                    # Other unknown message types
                    logger.info(f"❓ Unknown message type: {message_type}")

                # Log complete message content (except audio data, as it's too long)
                if message_type != "response.audio.delta":
                    logger.info(f"Complete message content:\n{json.dumps(data, ensure_ascii=False, indent=2)}")

                logger.info("=" * 60)

        except websockets.exceptions.ConnectionClosed:
            logger.info("WebSocket connection closed")
            self.is_connected = False
        except Exception as e:
            logger.error(f"Error handling response: {e}")
            self.is_connected = False

    async def start_conversation(self):
        """Start conversation"""
        try:
            # Start audio streams
            self.start_audio_streams()

            # Create tasks
            response_task = asyncio.create_task(self.handle_response())
            recording_task = asyncio.create_task(self.start_recording())

            logger.info("Conversation started, press Ctrl+C to stop")

            # Wait for tasks to complete
            await asyncio.gather(response_task, recording_task)

        except KeyboardInterrupt:
            logger.info("Stop signal received")
        except Exception as e:
            logger.error(f"Error during conversation: {e}")
        finally:
            await self.cleanup()

    async def cleanup(self):
        """Clean up resources"""
        self.is_recording = False
        self.is_connected = False

        # Stop audio streams
        self.stop_audio_streams()

        # Close WebSocket connection
        if self.websocket:
            await self.websocket.close()
            logger.info("WebSocket connection closed")

        # Terminate PyAudio
        self.audio.terminate()
        logger.info("Resource cleanup complete")

    async def run(self):
        """Run client"""
        try:
            await self.connect()
            await self.start_conversation()
        except Exception as e:
            logger.error(f"Error running client: {e}")
        finally:
            await self.cleanup()


async def main():
    """Main function"""
    # Configuration parameters
    WEBSOCKET_URL = "wss://new-api.weroam.xyz/v1/realtime?model=gpt-4o-realtime-preview-2024-12-17"
    API_KEY = "sk-EpnduEXFxjAt0AF55W08WBmzqZHlv9f4tmCDWd9TcJqBwVjV"
    MODEL = "gpt-4o-realtime-preview-2024-12-17"

    # Create client
    client = OpenAIRealtimeClient(
        websocket_url=WEBSOCKET_URL,
        api_key=API_KEY,
        model=MODEL
    )

    # Run client
    await client.run()


if __name__ == "__main__":
    print("OpenAI Realtime WebSocket Audio Example")
    print("=" * 50)
    print("Features:")
    print("- Real-time voice conversation")
    print("- Automatic speech recognition")
    print("- Text and audio responses")
    print("- Press Ctrl+C to stop")
    print("=" * 50)

    try:
        asyncio.run(main())
    except KeyboardInterrupt:
        print("\nProgram stopped")
    except Exception as e:
        print(f"Program error: {e}")

⚠️ 오류 처리¶

주요 오류¶

연결 오류
네트워크 문제
인증 실패
설정 오류
오디오 오류
장치 권한
지원되지 않는 형식
코덱 문제
세션 오류
토큰 만료
세션 타임아웃
동시성 제한

오류 복구¶

자동 재연결
세션 복구
오류 재시도
점진적 기능 저하 처리

📝 이벤트 레퍼런스¶

공통 요청 헤더¶

모든 이벤트는 다음 요청 헤더를 포함해야 합니다:

헤더	유형	설명	예시 값
Authorization	문자열	인증 토큰	Bearer $API_KEY
OpenAI-Beta	문자열	API 버전	realtime=v1

클라이언트 이벤트¶

session.update¶

세션의 기본 설정을 업데이트합니다.

파라미터	유형	필수	설명	예시 값/선택 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_123
type	문자열	아니오	이벤트 타입	session.update
modalities	문자열 배열	아니오	모델이 응답할 수 있는 모달리티 타입	["text", "audio"]
instructions	문자열	아니오	모델 호출 전에 prepend되는 시스템 지침	"Your knowledge cutoff is 2023-10..."
voice	문자열	아니오	모델이 사용할 음성 타입	alloy, echo, shimmer
input_audio_format	문자열	아니오	입력 오디오 형식	pcm16, g711_ulaw, g711_alaw
output_audio_format	문자열	아니오	출력 오디오 형식	pcm16, g711_ulaw, g711_alaw
input_audio_transcription.model	문자열	아니오	전사에 사용할 모델	whisper-1
turn_detection.type	문자열	아니오	음성 감지 타입	server_vad
turn_detection.threshold	숫자	아니오	VAD 활성화 임계값(0.0-1.0)	0.8
turn_detection.prefix_padding_ms	정수	아니오	발화 시작 전 포함할 오디오 길이	500
turn_detection.silence_duration_ms	정수	아니오	발화 종료 감지를 위한 무음 길이	1000
tools	배열	아니오	모델이 사용할 수 있는 도구 목록	[]
tool_choice	문자열	아니오	모델의 도구 선택 방식	auto/none/required
temperature	숫자	아니오	모델 샘플링 온도	0.8
max_output_tokens	문자열/정수	아니오	응답당 최대 토큰 수	"inf"/4096

input_audio_buffer.append¶

입력 오디오 버퍼에 오디오 데이터를 추가합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_456
type	문자열	아니오	이벤트 타입	input_audio_buffer.append
audio	문자열	아니오	Base64 인코딩 오디오 데이터	Base64EncodedAudioData

input_audio_buffer.commit¶

버퍼의 오디오 데이터를 사용자 메시지로 확정합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_789
type	문자열	아니오	이벤트 타입	input_audio_buffer.commit

input_audio_buffer.clear¶

입력 오디오 버퍼의 모든 데이터를 지웁니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_012
type	문자열	아니오	이벤트 타입	input_audio_buffer.clear

conversation.item.create¶

대화에 새 대화 아이템을 추가합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_345
type	문자열	아니오	이벤트 타입	conversation.item.create
previous_item_id	문자열	아니오	새 아이템이 이 ID 뒤에 삽입됨	null
item.id	문자열	아니오	대화 아이템 고유 식별자	msg_001
item.type	문자열	아니오	대화 아이템 타입	message/function_call/function_call_output
item.status	문자열	아니오	대화 아이템 상태	completed/in_progress/incomplete
item.role	문자열	아니오	메시지 발신자 역할	user/assistant/system
item.content	배열	아니오	메시지 콘텐츠	[text/audio/transcript]
item.call_id	문자열	아니오	함수 호출 ID	call_001
item.name	문자열	아니오	호출된 함수 이름	function_name
item.arguments	문자열	아니오	함수 호출 인자	{"param": "value"}
item.output	문자열	아니오	함수 호출 결과	{"result": "value"}

conversation.item.truncate¶

어시스턴트 메시지의 오디오 콘텐츠를 잘라냅니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_678
type	문자열	아니오	이벤트 타입	conversation.item.truncate
item_id	문자열	아니오	잘라낼 어시스턴트 메시지 아이템 ID	msg_002
content_index	정수	아니오	잘라낼 콘텐츠 파트 인덱스	0
audio_end_ms	정수	아니오	오디오 잘라내기 종료 시점	1500

conversation.item.delete¶

대화 기록에서 지정한 대화 아이템을 삭제합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_901
type	문자열	아니오	이벤트 타입	conversation.item.delete
item_id	문자열	아니오	삭제할 대화 아이템 ID	msg_003

response.create¶

응답 생성을 트리거합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_234
type	문자열	아니오	이벤트 타입	response.create
response.modalities	문자열 배열	아니오	응답 모달리티 타입	["text", "audio"]
response.instructions	문자열	아니오	모델 지침	"Please assist the user."
response.voice	문자열	아니오	모델이 사용할 음성 타입	alloy/echo/shimmer
response.output_audio_format	문자열	아니오	출력 오디오 형식	pcm16
response.tools	배열	아니오	모델이 사용할 수 있는 도구 목록	["type", "name", "description"]
response.tool_choice	문자열	아니오	모델의 도구 선택 방식	auto
response.temperature	숫자	아니오	샘플링 온도	0.7
response.max_output_tokens	정수/문자열	아니오	최대 출력 토큰 수	150/"inf"

response.cancel¶

진행 중인 응답 생성을 취소합니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	클라이언트가 생성한 이벤트 식별자	event_567
type	문자열	아니오	이벤트 타입	response.cancel

서버 이벤트¶

error¶

오류 발생 시 반환되는 이벤트입니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열 배열	아니오	서버 이벤트 고유 식별자	["event_890"]
type	문자열	아니오	이벤트 타입	error
error.type	문자열	아니오	오류 타입	invalid_request_error/server_error
error.code	문자열	아니오	오류 코드	invalid_event
error.message	문자열	아니오	사람이 읽을 수 있는 오류 메시지	"The 'type' field is missing."
error.param	문자열	아니오	오류 관련 파라미터	null
error.event_id	문자열	아니오	관련 이벤트 ID	event_567

conversation.item.input_audio_transcription.completed¶

입력 오디오 전사가 활성화되어 있고 전사가 성공했을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_2122
type	문자열	아니오	이벤트 타입	conversation.item.input_audio_transcription.completed
item_id	문자열	아니오	사용자 메시지 아이템 ID	msg_003
content_index	정수	아니오	오디오가 포함된 콘텐츠 파트 인덱스	0
transcript	문자열	아니오	전사된 텍스트 콘텐츠	"Hello, how are you?"

conversation.item.input_audio_transcription.failed¶

입력 오디오 전사가 설정되어 있으나 사용자 메시지 전사 요청이 실패했을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_2324
type	문자열 배열	아니오	이벤트 타입	["conversation.item.input_audio_transcription.failed"]
item_id	문자열	아니오	사용자 메시지 아이템 ID	msg_003
content_index	정수	아니오	오디오가 포함된 콘텐츠 파트 인덱스	0
error.type	문자열	아니오	오류 타입	transcription_error
error.code	문자열	아니오	오류 코드	audio_unintelligible
error.message	문자열	아니오	사람이 읽을 수 있는 오류 메시지	"The audio could not be transcribed."
error.param	문자열	아니오	오류 관련 파라미터	null

conversation.item.truncated¶

클라이언트가 이전 어시스턴트 오디오 메시지 아이템을 잘라냈을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_2526
type	문자열	아니오	이벤트 타입	conversation.item.truncated
item_id	문자열	아니오	의 ID truncated assistant message item	msg_004
content_index	정수	아니오	잘린 콘텐츠 파트 인덱스	0
audio_end_ms	정수	아니오	오디오가 잘린 시점(밀리초)	1500

conversation.item.deleted¶

대화 내 아이템이 삭제되었을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_2728
type	문자열	아니오	이벤트 타입	conversation.item.deleted
item_id	문자열	아니오	의 ID deleted conversation item	msg_005

input_audio_buffer.committed¶

오디오 버퍼 데이터가 확정되었을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1121
type	문자열	아니오	이벤트 타입	input_audio_buffer.committed
previous_item_id	문자열	아니오	새 대화 아이템이 이 ID 뒤에 삽입됨	msg_001
item_id	문자열	아니오	사용자 메시지 아이템 ID 생성될	msg_002

input_audio_buffer.cleared¶

클라이언트가 입력 오디오 버퍼를 비웠을 때 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1314
type	문자열	아니오	이벤트 타입	input_audio_buffer.cleared

input_audio_buffer.speech_started¶

서버 음성 감지 모드에서 음성 입력이 감지되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1516
type	문자열	아니오	이벤트 타입	input_audio_buffer.speech_started
audio_start_ms	정수	아니오	세션 시작 후 음성 감지까지의 밀리초	1000
item_id	문자열	아니오	사용자 메시지 아이템 ID 생성될 when voice stops	msg_003

input_audio_buffer.speech_stopped¶

서버 음성 감지 모드에서 음성 입력이 멈추면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1718
type	문자열	아니오	이벤트 타입	input_audio_buffer.speech_stopped
audio_start_ms	정수	아니오	세션 시작 후 음성 종료 감지까지의 밀리초	2000
item_id	문자열	아니오	사용자 메시지 아이템 ID 생성될	msg_003

response.created¶

새 응답이 생성되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_2930
type	문자열	아니오	이벤트 타입	response.created
response.id	문자열	아니오	응답 고유 식별자	resp_001
response.object	문자열	아니오	객체 타입	realtime.response
response.status	문자열	아니오	Status of response	in_progress
response.status_details	객체	아니오	상태에 대한 추가 상세 정보	null
response.output	문자열 배열	아니오	응답이 생성한 출력 아이템 목록	["[]"]
response.usage	객체	아니오	Usage statistics for response	null

response.done¶

응답 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_3132
type	문자열	아니오	이벤트 타입	response.done
response.id	문자열	아니오	응답 고유 식별자	resp_001
response.object	문자열	아니오	객체 타입	realtime.response
response.status	문자열	아니오	응답 최종 상태	completed/cancelled/failed/incomplete
response.status_details	객체	아니오	상태에 대한 추가 상세 정보	null
response.output	문자열 배열	아니오	응답이 생성한 출력 아이템 목록	["[...]"]
response.usage.total_tokens	정수	아니오	총 토큰 수	50
response.usage.input_tokens	정수	아니오	입력 토큰 수	20
response.usage.output_tokens	정수	아니오	출력 토큰 수	30

response.output_item.added¶

응답 생성 중 새 출력 아이템이 생성되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_3334
type	문자열	아니오	이벤트 타입	response.output_item.added
response_id	문자열	아니오	의 ID response the output item belongs to	resp_001
output_index	문자열	아니오	응답 내 출력 아이템 인덱스	0
item.id	문자열	아니오	출력 아이템 고유 식별자	msg_007
item.object	문자열	아니오	객체 타입	realtime.item
item.type	문자열	아니오	출력 아이템 타입	message/function_call/function_call_output
item.status	문자열	아니오	출력 아이템 상태	in_progress/completed
item.role	문자열	아니오	출력 아이템과 연결된 역할	assistant
item.content	배열	아니오	출력 아이템 콘텐츠	["type", "text", "audio", "transcript"]

response.output_item.done¶

출력 아이템 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_3536
type	문자열	아니오	이벤트 타입	response.output_item.done
response_id	문자열	아니오	의 ID response the output item belongs to	resp_001
output_index	문자열	아니오	응답 내 출력 아이템 인덱스	0
item.id	문자열	아니오	출력 아이템 고유 식별자	msg_007
item.object	문자열	아니오	객체 타입	realtime.item
item.type	문자열	아니오	출력 아이템 타입	message/function_call/function_call_output
item.status	문자열	아니오	Final status of output item	completed/incomplete
item.role	문자열	아니오	출력 아이템과 연결된 역할	assistant
item.content	배열	아니오	출력 아이템 콘텐츠	["type", "text", "audio", "transcript"]

response.content_part.added¶

응답 생성 중 어시스턴트 메시지 아이템에 새 콘텐츠 파트가 추가되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_3738
type	문자열	아니오	이벤트 타입	response.content_part.added
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	의 ID message item to add content part to	msg_007
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
part.type	문자열	아니오	콘텐츠 타입	text/audio
part.text	문자열	아니오	텍스트 콘텐츠	"Hello"
part.audio	문자열	아니오	Base64 인코딩 오디오 데이터	"base64_encoded_audio_data"
part.transcript	문자열	아니오	오디오 전사 텍스트	"Hello"

response.content_part.done¶

어시스턴트 메시지 아이템의 콘텐츠 파트 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_3940
type	문자열	아니오	이벤트 타입	response.content_part.done
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	의 ID message item to add content part to	msg_007
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
part.type	문자열	아니오	콘텐츠 타입	text/audio
part.text	문자열	아니오	텍스트 콘텐츠	"Hello"
part.audio	문자열	아니오	Base64 인코딩 오디오 데이터	"base64_encoded_audio_data"
part.transcript	문자열	아니오	오디오 전사 텍스트	"Hello"

response.text.delta¶

"text" 타입 콘텐츠 파트의 텍스트 값이 업데이트되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_4142
type	문자열	아니오	이벤트 타입	response.text.delta
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_007
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
delta	문자열	아니오	텍스트 델타 업데이트 내용	"Sure, I can h"

response.text.done¶

"text" 타입 콘텐츠 파트 텍스트 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_4344
type	문자열	아니오	이벤트 타입	response.text.done
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_007
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
delta	문자열	아니오	최종 완성 텍스트 콘텐츠	"Sure, I can help with that."

response.audio_transcript.delta¶

모델이 생성한 오디오 출력의 전사 내용이 업데이트되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_4546
type	문자열	아니오	이벤트 타입	response.audio_transcript.delta
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_008
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
delta	문자열	아니오	전사 텍스트 델타 업데이트 내용	"Hello, how can I a"

response.audio_transcript.done¶

모델이 생성한 오디오 출력의 전사 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_4748
type	문자열	아니오	이벤트 타입	response.audio_transcript.done
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_008
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
transcript	문자열	아니오	최종 완성 오디오 전사 텍스트	"Hello, how can I assist you today?"

response.audio.delta¶

모델이 생성한 오디오 콘텐츠가 업데이트되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_4950
type	문자열	아니오	이벤트 타입	response.audio.delta
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_008
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0
delta	문자열	아니오	Base64 인코딩 오디오 데이터 델타	"Base64EncodedAudioDelta"

response.audio.done¶

모델이 생성한 오디오가 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_5152
type	문자열	아니오	이벤트 타입	response.audio.done
response_id	문자열	아니오	응답 ID	resp_001
item_id	문자열	아니오	메시지 아이템 ID	msg_008
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
content_index	정수	아니오	메시지 아이템 콘텐츠 배열 내 콘텐츠 파트 인덱스	0

함수 호출¶

response.function_call_arguments.delta¶

모델이 생성한 함수 호출 인자가 업데이트되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_5354
type	문자열	아니오	이벤트 타입	response.function_call_arguments.delta
response_id	문자열	아니오	응답 ID	resp_002
item_id	문자열	아니오	메시지 아이템 ID	fc_001
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
call_id	문자열	아니오	함수 호출 ID	call_001
delta	문자열	아니오	JSON 형식 함수 호출 인자 델타	"{\"location\": \"San\""

response.function_call_arguments.done¶

모델이 생성한 함수 호출 인자 스트리밍이 완료되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_5556
type	문자열	아니오	이벤트 타입	response.function_call_arguments.done
response_id	문자열	아니오	응답 ID	resp_002
item_id	문자열	아니오	메시지 아이템 ID	fc_001
output_index	정수	아니오	응답 내 출력 아이템 인덱스	0
call_id	문자열	아니오	함수 호출 ID	call_001
arguments	문자열	아니오	최종 완성 함수 호출 인자(JSON 형식)	"{\"location\": \"San Francisco\"}"

기타 상태 업데이트¶

rate_limits.updated¶

각 response.done 이벤트 이후 갱신된 레이트 리밋을 알리기 위해 트리거됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_5758
type	문자열	아니오	이벤트 타입	rate_limits.updated
rate_limits	객체 배열	아니오	레이트 리밋 정보 목록	[{"name": "requests_per_min", "limit": 60, "remaining": 45, "reset_seconds": 35}]

conversation.created¶

대화가 생성되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_9101
type	문자열	아니오	이벤트 타입	conversation.created
conversation	객체	아니오	대화 리소스 객체	{"id": "conv_001", "object": "realtime.conversation"}

conversation.item.created¶

대화 아이템이 생성되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1920
type	문자열	아니오	이벤트 타입	conversation.item.created
previous_item_id	문자열	아니오	이전 대화 아이템 ID	msg_002
item	객체	아니오	대화 아이템 객체	{"id": "msg_003", "object": "realtime.item", "type": "message", "status": "completed", "role": "user", "content": [{"type": "text", "text": "Hello"}]}

session.created¶

세션이 생성되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_1234
type	문자열	아니오	이벤트 타입	session.created
session	객체	아니오	세션 객체	{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

session.updated¶

세션이 업데이트되면 반환됩니다.

파라미터	유형	필수	설명	예시 값
event_id	문자열	아니오	서버 이벤트 고유 식별자	event_5678
type	문자열	아니오	이벤트 타입	session.updated
session	객체	아니오	업데이트된 세션 객체	{"id": "sess_001", "object": "realtime.session", "model": "gpt-4", "modalities": ["text", "audio"]}

Rate Limit 이벤트 파라미터 표¶

파라미터	유형	필수	설명	예시 값
name	문자열	예	Limit name	requests_per_min
limit	정수	예	Limit value	60
remaining	정수	예	Remaining available amount	45
reset_seconds	정수	예	Reset time (seconds)	35

함수 호출 파라미터 표¶

파라미터	유형	필수	설명	예시 값
type	문자열	예	Function type	function
name	문자열	예	Function name	get_weather
description	문자열	아니오	Function description	Get the current weather
parameters	객체	예	Function parameter definition	{"type": "object", "properties": {...}}

오디오 형식 파라미터 표¶

파라미터	유형	설명	선택 값
sample_rate	정수	샘플레이트	8000, 16000, 24000, 44100, 48000
channels	정수	채널 수	1 (mono), 2 (stereo)
bits_per_sample	정수	샘플당 비트 수	16 (pcm16), 8 (g711)
encoding	문자열	인코딩 방식	pcm16, g711_ulaw, g711_alaw

음성 감지 파라미터 표¶

파라미터	유형	설명	기본값	범위
threshold	실수	VAD 활성화 임계값	0.5	0.0-1.0
prefix_padding_ms	정수	발화 전 프리픽스 패딩(밀리초)	500	0-5000
silence_duration_ms	정수	무음 감지 길이(밀리초)	1000	100-10000

도구 선택 파라미터 표¶

파라미터	유형	설명	선택 값
tool_choice	문자열	도구 선택 방식	auto, none, required
tools	배열	사용 가능한 도구 목록	[{type, name, description, parameters}]

모델 설정 파라미터 표¶

파라미터	유형	설명	범위/선택 값	기본값
temperature	Float	샘플링 온도	0.0-2.0	1.0
max_output_tokens	정수/문자열	최대 출력 길이	1-4096/"inf"	"inf"
modalities	문자열 배열	응답 모달리티	["text", "audio"]	["text"]
voice	문자열	음성 타입	alloy, echo, shimmer	alloy

이벤트 공통 파라미터 표¶

파라미터	유형	필수	설명	예시 값
event_id	문자열	예	이벤트 고유 식별자	event_123
type	문자열	예	이벤트 타입	session.update
timestamp	정수	아니오	이벤트 타임스탬프(밀리초)	1677649363000

세션 상태 파라미터 표¶

파라미터	유형	설명	선택 값
status	문자열	세션 상태	active, ended, error
error	객체	오류 정보	{"type": "error_type", "message": "error message"}
metadata	객체	세션 메타데이터	{"client_id": "web", "session_type": "chat"}

대화 아이템 상태 파라미터 표¶

파라미터	유형	설명	선택 값
status	문자열	대화 아이템 상태	completed, in_progress, incomplete
role	문자열	발신자 역할	user, assistant, system
type	문자열	대화 아이템 타입	message, function_call, function_call_output

콘텐츠 타입 파라미터 표¶

파라미터	유형	설명	선택 값
type	문자열	콘텐츠 타입	text, audio, transcript
format	문자열	콘텐츠 형식	plain, markdown, html
encoding	문자열	인코딩 방식	utf-8, base64

응답 상태 파라미터 표¶

파라미터	유형	설명	선택 값
status	문자열	응답 상태	completed, cancelled, failed, incomplete
status_details	객체	상태 상세 정보	{"reason": "user_cancelled"}
usage	객체	사용량 통계	{"total_tokens": 50, "input_tokens": 20, "output_tokens": 30}

오디오 전사 파라미터 표¶

파라미터	유형	설명	예시 값
enabled	불리언	전사 활성화 여부	true
model	문자열	전사 모델	whisper-1
language	문자열	전사 언어	en, zh, auto
prompt	문자열	전사 프롬프트	"Transcript of a conversation"

오디오 스트림 파라미터 표¶

파라미터	유형	설명	선택 값
chunk_size	정수	오디오 청크 크기(바이트)	1024, 2048, 4096
latency	문자열	지연 모드	low, balanced, high
compression	문자열	압축 방식	none, opus, mp3

WebRTC 설정 파라미터 표¶

파라미터	유형	설명	기본값
ice_servers	배열	ICE 서버 목록	[{"urls": "stun:stun.l.google.com:19302"}]
audio_constraints	객체	오디오 제약 조건	{"echoCancellation": true}
connection_timeout	정수	연결 타임아웃(밀리초)	30000