Reference

Agent V1 Settings Think Models

client.agent.v1.settings.think.models.list() -> AsyncHttpResponse[AgentThinkModelsV1Response]

📝 Description

Retrieves the available think models that can be used for AI agent processing

🔌 Usage

from deepgram import DeepgramClient

client = DeepgramClient(
    api_key="YOUR_API_KEY",
)
client.agent.v1.settings.think.models.list()

⚙️ Parameters

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

Auth V1 Tokens

client.auth.v1.tokens.grant(...) -> AsyncHttpResponse[GrantV1Response]

📝 Description

Generates a temporary JSON Web Token (JWT) with a 30-second (by default) TTL and usage::write permission for core voice APIs, requiring an API key with Member or higher authorization. Tokens created with this endpoint will not work with the Manage APIs.

🔌 Usage

from deepgram import DeepgramClient

client = DeepgramClient(
    api_key="YOUR_API_KEY",
)
client.auth.v1.tokens.grant()

⚙️ Parameters

ttl_seconds: typing.Optional[float] — Time to live in seconds for the token. Defaults to 30 seconds.

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

Listen V1 Media

client.listen.v1.media.transcribe_url(...) -> AsyncHttpResponse[MediaTranscribeResponse]

📝 Description

Transcribe audio and video using Deepgram's speech-to-text REST API

🔌 Usage

from deepgram import DeepgramClient

client = DeepgramClient(
    api_key="YOUR_API_KEY",
)
client.listen.v1.media.transcribe_url(
    callback="callback",
    callback_method="POST",
    extra="extra",
    sentiment=True,
    summarize="v2",
    tag="tag",
    topics=True,
    custom_topic="custom_topic",
    custom_topic_mode="extended",
    intents=True,
    custom_intent="custom_intent",
    custom_intent_mode="extended",
    detect_entities=True,
    detect_language=True,
    diarize=True,
    dictation=True,
    encoding="linear16",
    filler_words=True,
    keywords="keywords",
    language="language",
    measurements=True,
    model="nova-3",
    multichannel=True,
    numerals=True,
    paragraphs=True,
    profanity_filter=True,
    punctuate=True,
    redact="redact",
    replace="replace",
    search="search",
    smart_format=True,
    utterances=True,
    utt_split=1.1,
    version="latest",
    mip_opt_out=True,
    url="https://dpgr.am/spacewalk.wav",
)

⚙️ Parameters

url: str

callback: typing.Optional[str] — URL to which we'll make the callback request

callback_method: typing.Optional[MediaTranscribeRequestCallbackMethod] — HTTP method by which the callback request will be made

extra: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Arbitrary key-value pairs that are attached to the API response for usage in downstream processing

sentiment: typing.Optional[bool] — Recognizes the sentiment throughout a transcript or text

summarize: typing.Optional[MediaTranscribeRequestSummarize] — Summarize content. For Listen API, supports string version option. For Read API, accepts boolean only.

tag: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Label your requests for the purpose of identification during usage reporting

topics: typing.Optional[bool] — Detect topics throughout a transcript or text

custom_topic: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Custom topics you want the model to detect within your input audio or text if present Submit up to 100.

custom_topic_mode: typing.Optional[MediaTranscribeRequestCustomTopicMode] — Sets how the model will interpret strings submitted to the custom_topic param. When strict, the model will only return topics submitted using the custom_topic param. When extended, the model will return its own detected topics in addition to those submitted using the custom_topic param

intents: typing.Optional[bool] — Recognizes speaker intent throughout a transcript or text

custom_intent: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Custom intents you want the model to detect within your input audio if present

custom_intent_mode: typing.Optional[MediaTranscribeRequestCustomIntentMode] — Sets how the model will interpret intents submitted to the custom_intent param. When strict, the model will only return intents submitted using the custom_intent param. When extended, the model will return its own detected intents in the custom_intent param.

detect_entities: typing.Optional[bool] — Identifies and extracts key entities from content in submitted audio

detect_language: typing.Optional[bool] — Identifies the dominant language spoken in submitted audio

diarize: typing.Optional[bool] — Recognize speaker changes. Each word in the transcript will be assigned a speaker number starting at 0

dictation: typing.Optional[bool] — Dictation mode for controlling formatting with dictated speech

encoding: typing.Optional[MediaTranscribeRequestEncoding] — Specify the expected encoding of your submitted audio

filler_words: typing.Optional[bool] — Filler Words can help transcribe interruptions in your audio, like "uh" and "um"

keyterm: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Key term prompting can boost or suppress specialized terminology and brands. Only compatible with Nova-3

keywords: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Keywords can boost or suppress specialized terminology and brands

language: typing.Optional[str] — The BCP-47 language tag that hints at the primary spoken language. Depending on the Model and API endpoint you choose only certain languages are available

measurements: typing.Optional[bool] — Spoken measurements will be converted to their corresponding abbreviations

model: typing.Optional[MediaTranscribeRequestModel] — AI model used to process submitted audio

multichannel: typing.Optional[bool] — Transcribe each audio channel independently

numerals: typing.Optional[bool] — Numerals converts numbers from written format to numerical format

paragraphs: typing.Optional[bool] — Splits audio into paragraphs to improve transcript readability

profanity_filter: typing.Optional[bool] — Profanity Filter looks for recognized profanity and converts it to the nearest recognized non-profane word or removes it from the transcript completely

punctuate: typing.Optional[bool] — Add punctuation and capitalization to the transcript

redact: typing.Optional[str] — Redaction removes sensitive information from your transcripts

replace: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Search for terms or phrases in submitted audio and replaces them

search: typing.Optional[typing.Union[str, typing.Sequence[str]]] — Search for terms or phrases in submitted audio

smart_format: typing.Optional[bool] — Apply formatting to transcript output. When set to true, additional formatting will be applied to transcripts to improve readability

utterances: typing.Optional[bool] — Segments speech into meaningful semantic units

utt_split: typing.Optional[float] — Seconds to wait before detecting a pause between words in submitted audio

version: typing.Optional[MediaTranscribeRequestVersion] — Version of an AI model to use

mip_opt_out: typing.Optional[bool] — Opts out requests from the Deepgram Model Improvement Program. Refer to our Docs for pricing impacts before setting this to true. https://dpgr.am/deepgram-mip

request_options: typing.Optional[RequestOptions] — Request-specific configuration.

client.listen.v1.media.transcribe_file(...) -> AsyncHttpResponse[MediaTranscribeResponse]

📝 Description

Transcribe audio and video using Deepgram's speech-to-text REST API

🔌 Usage

from deepgram import DeepgramClient

client = DeepgramClient(
    api_key="YOUR_API_KEY",
)

with open("audio.wav", "rb") as f:
    response = client.listen.v1.media.transcribe_file(request=f)

⚙️ Parameters

request: typing.Union[bytes, typing.Iterator[bytes], typing.AsyncIterator[bytes]]