carbon-typescript-sdk
v0.2.53
Published
Client for Carbon
Downloads
8,723
Readme
Carbon
Connect external data to LLMs, no matter the source.
Table of Contents
- Installation
- Getting Started
- Reference
carbon.auth.getAccessToken
carbon.auth.getWhiteLabeling
carbon.cRM.getAccount
carbon.cRM.getAccounts
carbon.cRM.getContact
carbon.cRM.getContacts
carbon.cRM.getLead
carbon.cRM.getLeads
carbon.cRM.getOpportunities
carbon.cRM.getOpportunity
carbon.dataSources.addTags
carbon.dataSources.query
carbon.dataSources.queryUserDataSources
carbon.dataSources.removeTags
carbon.dataSources.revokeAccessToken
carbon.embeddings.getDocuments
carbon.embeddings.getEmbeddingsAndChunks
carbon.embeddings.list
carbon.embeddings.uploadChunksAndEmbeddings
carbon.files.createUserFileTags
carbon.files.deleteFileTags
carbon.files.deleteMany
carbon.files.deleteV2
carbon.files.getParsedFile
carbon.files.getRawFile
carbon.files.modifyColdStorageParameters
carbon.files.moveToHotStorage
carbon.files.queryUserFiles
carbon.files.queryUserFilesDeprecated
carbon.files.resync
carbon.files.upload
carbon.files.uploadFromUrl
carbon.files.uploadText
carbon.github.getIssue
carbon.github.getIssues
carbon.github.getPr
carbon.github.getPrComments
carbon.github.getPrCommits
carbon.github.getPrFiles
carbon.github.getPullRequests
carbon.integrations.cancel
carbon.integrations.connectDataSource
carbon.integrations.connectDocument360
carbon.integrations.connectFreshdesk
carbon.integrations.connectGitbook
carbon.integrations.connectGuru
carbon.integrations.createAwsIamUser
carbon.integrations.getOauthUrl
carbon.integrations.listConfluencePages
carbon.integrations.listConversations
carbon.integrations.listDataSourceItems
carbon.integrations.listFolders
carbon.integrations.listGitbookSpaces
carbon.integrations.listLabels
carbon.integrations.listOutlookCategories
carbon.integrations.listRepos
carbon.integrations.listSharepointSites
carbon.integrations.syncAzureBlobFiles
carbon.integrations.syncAzureBlobStorage
carbon.integrations.syncConfluence
carbon.integrations.syncDataSourceItems
carbon.integrations.syncFiles
carbon.integrations.syncGitHub
carbon.integrations.syncGitbook
carbon.integrations.syncGmail
carbon.integrations.syncOutlook
carbon.integrations.syncRepos
carbon.integrations.syncRssFeed
carbon.integrations.syncS3Files
carbon.integrations.syncSlack
carbon.organizations.get
carbon.organizations.update
carbon.organizations.updateStats
carbon.users.delete
carbon.users.get
carbon.users.list
carbon.users.toggleUserFeatures
carbon.users.updateUsers
carbon.users.whoAmI
carbon.utilities.fetchUrls
carbon.utilities.fetchWebpage
carbon.utilities.fetchYoutubeTranscripts
carbon.utilities.processSitemap
carbon.utilities.scrapeSitemap
carbon.utilities.scrapeWeb
carbon.utilities.searchUrls
carbon.utilities.userWebpages
carbon.webhooks.addUrl
carbon.webhooks.deleteUrl
carbon.webhooks.urls
carbon.whiteLabel.create
carbon.whiteLabel.delete
carbon.whiteLabel.list
carbon.whiteLabel.update
Installation
npm i carbon-typescript-sdk
pnpm i carbon-typescript-sdk
yarn add carbon-typescript-sdk
Getting Started
import { Carbon } from "carbon-typescript-sdk";
// Generally this is done in the backend to avoid exposing API key to the client
const carbonWithApiKey = new Carbon({
apiKey: "API_KEY",
customerId: "CUSTOMER_ID",
});
const accessToken = await carbonWithApiKey.auth.getAccessToken();
// Once an access token is obtained, it can be passed to the frontend
// and used to instantiate the SDK client without an API key
const carbon = new Carbon({
accessToken: accessToken.data.access_token,
});
// use SDK as usual
const whiteLabeling = await carbon.auth.getWhiteLabeling();
// etc.
Reference
carbon.auth.getAccessToken
Get Access Token
🛠️ Usage
const getAccessTokenResponse = await carbon.auth.getAccessToken();
🔄 Return
🌐 Endpoint
/auth/v1/access_token
GET
carbon.auth.getWhiteLabeling
Returns whether or not the organization is white labeled and which integrations are white labeled
:param current_user: the current user :param db: the database session :return: a WhiteLabelingResponse
🛠️ Usage
const getWhiteLabelingResponse = await carbon.auth.getWhiteLabeling();
🔄 Return
🌐 Endpoint
/auth/v1/white_labeling
GET
carbon.cRM.getAccount
Get Account
🛠️ Usage
const getAccountResponse = await carbon.cRM.getAccount({
id: "id_example",
dataSourceId: 1,
includeRemoteData: false,
});
⚙️ Parameters
id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes
[]
🔄 Return
🌐 Endpoint
/integrations/data/crm/accounts/{id}
GET
carbon.cRM.getAccounts
Get Accounts
🛠️ Usage
const getAccountsResponse = await carbon.cRM.getAccounts({
data_source_id: 1,
include_remote_data: false,
order_dir: "asc",
includes: [],
order_by: "created_at",
});
⚙️ Parameters
data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes
[]
filters: AccountFilters
order_by: AccountsOrderByNullable
🔄 Return
🌐 Endpoint
/integrations/data/crm/accounts
POST
carbon.cRM.getContact
Get Contact
🛠️ Usage
const getContactResponse = await carbon.cRM.getContact({
id: "id_example",
dataSourceId: 1,
includeRemoteData: false,
});
⚙️ Parameters
id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes
[]
🔄 Return
🌐 Endpoint
/integrations/data/crm/contacts/{id}
GET
carbon.cRM.getContacts
Get Contacts
🛠️ Usage
const getContactsResponse = await carbon.cRM.getContacts({
data_source_id: 1,
include_remote_data: false,
order_dir: "asc",
includes: [],
order_by: "created_at",
});
⚙️ Parameters
data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes
[]
filters: ContactFilters
order_by: ContactsOrderByNullable
🔄 Return
🌐 Endpoint
/integrations/data/crm/contacts
POST
carbon.cRM.getLead
Get Lead
🛠️ Usage
const getLeadResponse = await carbon.cRM.getLead({
id: "id_example",
dataSourceId: 1,
includeRemoteData: false,
});
⚙️ Parameters
id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes
[]
🔄 Return
🌐 Endpoint
/integrations/data/crm/leads/{id}
GET
carbon.cRM.getLeads
Get Leads
🛠️ Usage
const getLeadsResponse = await carbon.cRM.getLeads({
data_source_id: 1,
include_remote_data: false,
order_dir: "asc",
includes: [],
order_by: "created_at",
});
⚙️ Parameters
data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes
[]
filters: LeadFilters
order_by: LeadsOrderByNullable
🔄 Return
🌐 Endpoint
/integrations/data/crm/leads
POST
carbon.cRM.getOpportunities
Get Opportunities
🛠️ Usage
const getOpportunitiesResponse = await carbon.cRM.getOpportunities({
data_source_id: 1,
include_remote_data: false,
order_dir: "asc",
includes: [],
order_by: "created_at",
});
⚙️ Parameters
data_source_id: number
include_remote_data: boolean
next_cursor: string
page_size: number
order_dir: OrderDirV2Nullable
includes: BaseIncludes
[]
filters: OpportunityFilters
order_by: OpportunitiesOrderByNullable
🔄 Return
🌐 Endpoint
/integrations/data/crm/opportunities
POST
carbon.cRM.getOpportunity
Get Opportunity
🛠️ Usage
const getOpportunityResponse = await carbon.cRM.getOpportunity({
id: "id_example",
dataSourceId: 1,
includeRemoteData: false,
});
⚙️ Parameters
id: string
dataSourceId: number
includeRemoteData: boolean
includes: BaseIncludes
[]
🔄 Return
🌐 Endpoint
/integrations/data/crm/opportunities/{id}
GET
carbon.dataSources.addTags
Add Data Source Tags
🛠️ Usage
const addTagsResponse = await carbon.dataSources.addTags({
tags: {},
data_source_id: 1,
});
⚙️ Parameters
tags: object
data_source_id: number
🔄 Return
🌐 Endpoint
/data_sources/tags/add
POST
carbon.dataSources.query
Data Sources
🛠️ Usage
const queryResponse = await carbon.dataSources.query({
order_by: "created_at",
order_dir: "desc",
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserDataSourceOrderByColumns
order_dir: OrderDir
filters: OrganizationUserDataSourceFilters
🔄 Return
OrganizationUserDataSourceResponse
🌐 Endpoint
/data_sources
POST
carbon.dataSources.queryUserDataSources
User Data Sources
🛠️ Usage
const queryUserDataSourcesResponse =
await carbon.dataSources.queryUserDataSources({
order_by: "created_at",
order_dir: "desc",
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserDataSourceOrderByColumns
order_dir: OrderDir
filters: OrganizationUserDataSourceFilters
🔄 Return
OrganizationUserDataSourceResponse
🌐 Endpoint
/user_data_sources
POST
carbon.dataSources.removeTags
Remove Data Source Tags
🛠️ Usage
const removeTagsResponse = await carbon.dataSources.removeTags({
data_source_id: 1,
tags_to_remove: [],
remove_all_tags: false,
});
⚙️ Parameters
data_source_id: number
tags_to_remove: string
[]
remove_all_tags: boolean
🔄 Return
🌐 Endpoint
/data_sources/tags/remove
POST
carbon.dataSources.revokeAccessToken
Revoke Access Token
🛠️ Usage
const revokeAccessTokenResponse = await carbon.dataSources.revokeAccessToken({
data_source_id: 1,
});
⚙️ Parameters
data_source_id: number
🔄 Return
🌐 Endpoint
/revoke_access_token
POST
carbon.embeddings.getDocuments
For pre-filtering documents, using tags_v2
is preferred to using tags
(which is now deprecated). If both tags_v2
and tags
are specified, tags
is ignored. tags_v2
enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string
- "value" isn't optional and can be
any
or list[any
] - "negate" is optional and must be
true
orfalse
. If present andtrue
, then the filter block is negated in the resulting query. It isfalse
by default.
When querying embeddings, you can optionally specify the media_type
parameter in your request. By default (if
not set), it is equal to "TEXT". This means that the query will be performed over files that have
been parsed as text (for now, this covers all files except image files). If it is equal to "IMAGE",
the query will be performed over image files (for now, .jpg
and .png
files). You can think of this
field as an additional filter on top of any filters set in file_ids
and
When hybrid_search
is set to true, a combination of keyword search and semantic search are used to rank
and select candidate embeddings during information retrieval. By default, these search methods are weighted
equally during the ranking process. To adjust the weight (or "importance") of each search method, you can use
the hybrid_search_tuning_parameters
property. The description for the different tuning parameters are:
weight_a
: weight to assign to semantic searchweight_b
: weight to assign to keyword search
You must ensure that sum(weight_a, weight_b,..., weight_n)
for all n weights is equal to 1. The equality
has an error tolerance of 0.001 to account for possible floating point issues.
In order to use hybrid search for a customer across a set of documents, two flags need to be enabled:
- Use the
/modify_user_configuration
endpoint to to enablesparse_vectors
for the customer. The payload body for this request is below:
{
"configuration_key_name": "sparse_vectors",
"value": {
"enabled": true
}
}
- Make sure hybrid search is enabled for the documents across which you want to perform the search. For the
/uploadfile
endpoint, this can be done by setting the following query parameter:generate_sparse_vectors=true
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const getDocumentsResponse = await carbon.embeddings.getDocuments({
query: "query_example",
k: 1,
include_all_children: false,
media_type: "TEXT",
embedding_model: "OPENAI",
include_file_level_metadata: false,
high_accuracy: false,
exclude_cold_storage_files: false,
});
⚙️ Parameters
query: string
Query for which to get related chunks and embeddings.
k: number
Number of related chunks to return.
tags: Record<string, Tags1
>
A set of tags to limit the search to. Deprecated and may be removed in the future.
query_vector: number
[]
Optional query vector for which to get related chunks and embeddings. It must have been generated by the same model used to generate the embeddings across which the search is being conducted. Cannot provide both query
and query_vector
.
file_ids: number
[]
Optional list of file IDs to limit the search to
parent_file_ids: number
[]
Optional list of parent file IDs to limit the search to. A parent file describes a file to which another file belongs (e.g. a folder)
include_all_children: boolean
Flag to control whether or not to include all children of filtered files in the embedding search.
tags_v2: object
A set of tags to limit the search to. Use this instead of tags
, which is deprecated.
include_tags: boolean
Flag to control whether or not to include tags for each chunk in the response.
include_vectors: boolean
Flag to control whether or not to include embedding vectors in the response.
include_raw_file: boolean
Flag to control whether or not to include a signed URL to the raw file containing each chunk in the response.
hybrid_search: boolean
Flag to control whether or not to perform hybrid search.
hybrid_search_tuning_parameters: HybridSearchTuningParamsNullable
media_type: FileContentTypesNullable
embedding_model: EmbeddingGeneratorsNullable
include_file_level_metadata: boolean
Flag to control whether or not to include file-level metadata in the response. This metadata will be included in the content_metadata
field of each document along with chunk/embedding level metadata.
high_accuracy: boolean
Flag to control whether or not to perform a high accuracy embedding search. By default, this is set to false. If true, the search may return more accurate results, but may take longer to complete.
rerank: RerankParamsNullable
file_types_at_source: AutoSyncedSourceTypesPropertyInner
[]
Filter files based on their type at the source (for example help center tickets and articles)
exclude_cold_storage_files: boolean
Flag to control whether or not to exclude files that are not in hot storage. If set to False, then an error will be returned if any filtered files are in cold storage.
🔄 Return
🌐 Endpoint
/embeddings
POST
carbon.embeddings.getEmbeddingsAndChunks
Retrieve Embeddings And Content
🛠️ Usage
const getEmbeddingsAndChunksResponse =
await carbon.embeddings.getEmbeddingsAndChunks({
order_by: "created_at",
order_dir: "desc",
filters: {
user_file_id: 1,
embedding_model: "OPENAI",
},
include_vectors: false,
});
⚙️ Parameters
filters: EmbeddingsAndChunksFilters
pagination: Pagination
order_by: EmbeddingsAndChunksOrderByColumns
order_dir: OrderDir
include_vectors: boolean
🔄 Return
🌐 Endpoint
/text_chunks
POST
carbon.embeddings.list
Retrieve Embeddings And Content V2
🛠️ Usage
const listResponse = await carbon.embeddings.list({
order_by: "created_at",
order_dir: "desc",
filters: {
include_all_children: false,
non_synced_only: false,
},
include_vectors: false,
});
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
include_vectors: boolean
🔄 Return
🌐 Endpoint
/list_chunks_and_embeddings
POST
carbon.embeddings.uploadChunksAndEmbeddings
Upload Chunks And Embeddings
🛠️ Usage
const uploadChunksAndEmbeddingsResponse =
await carbon.embeddings.uploadChunksAndEmbeddings({
embedding_model: "OPENAI",
chunks_and_embeddings: [
{
file_id: 1,
chunks_and_embeddings: [
{
chunk_number: 1,
chunk: "chunk_example",
},
],
},
],
overwrite_existing: false,
chunks_only: false,
});
⚙️ Parameters
embedding_model: EmbeddingGenerators
chunks_and_embeddings: SingleChunksAndEmbeddingsUploadInput
[]
overwrite_existing: boolean
chunks_only: boolean
custom_credentials: { [key: string]: object; }
🔄 Return
🌐 Endpoint
/upload_chunks_and_embeddings
POST
carbon.files.createUserFileTags
A tag is a key-value pair that can be added to a file. This pair can then be used for searches (e.g. embedding searches) in order to narrow down the scope of the search. A file can have any number of tags. The following are reserved keys that cannot be used:
- db_embedding_id
- organization_id
- user_id
- organization_user_file_id
Carbon currently supports two data types for tag values - string
and list<string>
.
Keys can only be string
. If values other than string
and list<string>
are used,
they're automatically converted to strings (e.g. 4 will become "4").
🛠️ Usage
const createUserFileTagsResponse = await carbon.files.createUserFileTags({
tags: {
key: "string_example",
},
organization_user_file_id: 1,
});
⚙️ Parameters
tags: Record<string, Tags1
>
organization_user_file_id: number
🔄 Return
🌐 Endpoint
/create_user_file_tags
POST
carbon.files.deleteFileTags
Delete File Tags
🛠️ Usage
const deleteFileTagsResponse = await carbon.files.deleteFileTags({
tags: ["tags_example"],
organization_user_file_id: 1,
});
⚙️ Parameters
tags: string
[]
organization_user_file_id: number
🔄 Return
🌐 Endpoint
/delete_user_file_tags
POST
carbon.files.deleteMany
Delete Files Endpoint
🛠️ Usage
const deleteManyResponse = await carbon.files.deleteMany({
delete_non_synced_only: false,
send_webhook: false,
delete_child_files: false,
});
⚙️ Parameters
file_ids: number
[]
sync_statuses: ExternalFileSyncStatuses
[]
delete_non_synced_only: boolean
send_webhook: boolean
delete_child_files: boolean
🔄 Return
🌐 Endpoint
/delete_files
POST
carbon.files.deleteV2
Delete Files V2 Endpoint
🛠️ Usage
const deleteV2Response = await carbon.files.deleteV2({
send_webhook: false,
preserve_file_record: false,
});
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
send_webhook: boolean
preserve_file_record: boolean
Whether or not to delete all data related to the file from the database, BUT to preserve the file metadata, allowing for resyncs. By default preserve_file_record
is false, which means that all data related to the file as well as its metadata will be deleted. Note that even if preserve_file_record
is true, raw files uploaded via the uploadfile
endpoint still cannot be resynced.
🔄 Return
🌐 Endpoint
/delete_files_v2
POST
carbon.files.getParsedFile
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const getParsedFileResponse = await carbon.files.getParsedFile({
fileId: 1,
});
⚙️ Parameters
fileId: number
🔄 Return
🌐 Endpoint
/parsed_file/{file_id}
GET
carbon.files.getRawFile
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const getRawFileResponse = await carbon.files.getRawFile({
fileId: 1,
});
⚙️ Parameters
fileId: number
🔄 Return
🌐 Endpoint
/raw_file/{file_id}
GET
carbon.files.modifyColdStorageParameters
Modify Cold Storage Parameters
🛠️ Usage
const modifyColdStorageParametersResponse =
await carbon.files.modifyColdStorageParameters({});
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
enable_cold_storage: boolean
hot_storage_time_to_live: number
🌐 Endpoint
/modify_cold_storage_parameters
POST
carbon.files.moveToHotStorage
Move To Hot Storage
🛠️ Usage
const moveToHotStorageResponse = await carbon.files.moveToHotStorage({});
⚙️ Parameters
filters: OrganizationUserFilesToSyncFilters
🌐 Endpoint
/move_to_hot_storage
POST
carbon.files.queryUserFiles
For pre-filtering documents, using tags_v2
is preferred to using tags
(which is now deprecated). If both tags_v2
and tags
are specified, tags
is ignored. tags_v2
enables
building complex filters through the use of "AND", "OR", and negation logic. Take the below input as an example:
{
"OR": [
{
"key": "subject",
"value": "holy-bible",
"negate": false
},
{
"key": "person-of-interest",
"value": "jesus christ",
"negate": false
},
{
"key": "genre",
"value": "religion",
"negate": true
}
{
"AND": [
{
"key": "subject",
"value": "tao-te-ching",
"negate": false
},
{
"key": "author",
"value": "lao-tzu",
"negate": false
}
]
}
]
}
In this case, files will be filtered such that:
- "subject" = "holy-bible" OR
- "person-of-interest" = "jesus christ" OR
- "genre" != "religion" OR
- "subject" = "tao-te-ching" AND "author" = "lao-tzu"
Note that the top level of the query must be either an "OR" or "AND" array. Currently, nesting is limited to 3. For tag blocks (those with "key", "value", and "negate" keys), the following typing rules apply:
- "key" isn't optional and must be a
string
- "value" isn't optional and can be
any
or list[any
] - "negate" is optional and must be
true
orfalse
. If present andtrue
, then the filter block is negated in the resulting query. It isfalse
by default.
🛠️ Usage
const queryUserFilesResponse = await carbon.files.queryUserFiles({
order_by: "created_at",
order_dir: "desc",
presigned_url_expiry_time_seconds: 3600,
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: boolean
If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.
include_parsed_text_file: boolean
If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.
include_additional_files: boolean
If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.
presigned_url_expiry_time_seconds: number
The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.
🔄 Return
🌐 Endpoint
/user_files_v2
POST
carbon.files.queryUserFilesDeprecated
This route is deprecated. Use /user_files_v2
instead.
🛠️ Usage
const queryUserFilesDeprecatedResponse =
await carbon.files.queryUserFilesDeprecated({
order_by: "created_at",
order_dir: "desc",
presigned_url_expiry_time_seconds: 3600,
});
⚙️ Parameters
pagination: Pagination
order_by: OrganizationUserFilesToSyncOrderByTypes
order_dir: OrderDir
filters: OrganizationUserFilesToSyncFilters
include_raw_file: boolean
If true, the query will return presigned URLs for the raw file. Only relevant for the /user_files_v2 endpoint.
include_parsed_text_file: boolean
If true, the query will return presigned URLs for the parsed text file. Only relevant for the /user_files_v2 endpoint.
include_additional_files: boolean
If true, the query will return presigned URLs for additional files. Only relevant for the /user_files_v2 endpoint.
presigned_url_expiry_time_seconds: number
The expiry time for the presigned URLs. Only relevant for the /user_files_v2 endpoint.
🔄 Return
🌐 Endpoint
/user_files
POST
carbon.files.resync
Resync File
🛠️ Usage
const resyncResponse = await carbon.files.resync({
file_id: 1,
force_embedding_generation: false,
skip_file_processing: false,
});
⚙️ Parameters
file_id: number
chunk_size: number
chunk_overlap: number
force_embedding_generation: boolean
skip_file_processing: boolean
🔄 Return
🌐 Endpoint
/resync_file
POST
carbon.files.upload
This endpoint is used to directly upload local files to Carbon. The POST
request should be a multipart form request.
Note that the set_page_as_boundary
query parameter is applicable only to PDFs for now. When this value is set,
PDF chunks are at most one page long. Additional information can be retrieved for each chunk, however, namely the coordinates
of the bounding box around the chunk (this can be used for things like text highlighting). Following is a description
of all possible query parameters:
chunk_size
: the chunk size (in tokens) applied when splitting the documentchunk_overlap
: the chunk overlap (in tokens) applied when splitting the documentskip_embedding_generation
: whether or not to skip the generation of chunks and embeddingsset_page_as_boundary
: described aboveembedding_model
: the model used to generate embeddings for the document chunksuse_ocr
: whether or not to use OCR as a preprocessing step prior to generating chunks. Valid for PDFs, JPEGs, and PNGsgenerate_sparse_vectors
: whether or not to generate sparse vectors for the file. Required for hybrid search.prepend_filename_to_chunks
: whether or not to prepend the filename to the chunk text
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const uploadResponse = await carbon.files.upload({
skipEmbeddingGeneration: false,
setPageAsBoundary: false,
useOcr: false,
generateSparseVectors: false,
prependFilenameToChunks: false,
parsePdfTablesWithOcr: false,
detectAudioLanguage: false,
transcriptionService: "assemblyai",
includeSpeakerLabels: false,
mediaType: "TEXT",
splitRows: false,
enableColdStorage: false,
generateChunksOnly: false,
storeFileOnly: false,
file: fs.readFileSync("/path/to/file"),
});
⚙️ Parameters
file: Uint8Array | File | buffer.File
chunkSize: number
Chunk size in tiktoken tokens to be used when processing file.
chunkOverlap: number
Chunk overlap in tiktoken tokens to be used when processing file.
skipEmbeddingGeneration: boolean
Flag to control whether or not embeddings should be generated and stored when processing file.
setPageAsBoundary: boolean
Flag to control whether or not to set the a page's worth of content as the maximum amount of content that can appear in a chunk. Only valid for PDFs. See description route description for more information.
embeddingModel: EmbeddingModel
Embedding model that will be used to embed file chunks.
useOcr: boolean
Whether or not to use OCR when processing files. Valid for PDFs, JPEGs, and PNGs. Useful for documents with tables, images, and/or scanned text.
generateSparseVectors: boolean
Whether or not to generate sparse vectors for the file. This is required for the file to be a candidate for hybrid search.
prependFilenameToChunks: boolean
Whether or not to prepend the file's name to chunks.
maxItemsPerChunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parsePdfTablesWithOcr: boolean
Whether to use rich table parsing when use_ocr
is enabled.
detectAudioLanguage: boolean
Whether to automatically detect the language of the uploaded audio file.
transcriptionService: TranscriptionServiceNullable
The transcription service to use for audio files. If no service is specified, 'deepgram' will be used.
includeSpeakerLabels: boolean
Detect multiple speakers and label segments of speech by speaker for audio files.
mediaType: FileContentTypesNullable
The media type of the file. If not provided, it will be inferred from the file extension.
splitRows: boolean
Whether to split tabular rows into chunks. Currently only valid for CSV, TSV, and XLSX files.
enableColdStorage: boolean
Enable cold storage for the file. If set to true, the file will be moved to cold storage after a certain period of inactivity. Default is false.
hotStorageTimeToLive: number
Time in days after which the file will be moved to cold storage. Must be one of [1, 3, 7, 14, 30].
generateChunksOnly: boolean
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
storeFileOnly: boolean
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/uploadfile
POST
carbon.files.uploadFromUrl
Create Upload File From Url
🛠️ Usage
const uploadFromUrlResponse = await carbon.files.uploadFromUrl({
url: "url_example",
skip_embedding_generation: false,
set_page_as_boundary: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
use_textract: false,
prepend_filename_to_chunks: false,
parse_pdf_tables_with_ocr: false,
detect_audio_language: false,
transcription_service: "assemblyai",
include_speaker_labels: false,
media_type: "TEXT",
split_rows: false,
generate_chunks_only: false,
store_file_only: false,
});
⚙️ Parameters
url: string
file_name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
set_page_as_boundary: boolean
embedding_model: EmbeddingGenerators
generate_sparse_vectors: boolean
use_textract: boolean
prepend_filename_to_chunks: boolean
max_items_per_chunk: number
Number of objects per chunk. For csv, tsv, xlsx, and json files only.
parse_pdf_tables_with_ocr: boolean
detect_audio_language: boolean
transcription_service: TranscriptionServiceNullable
include_speaker_labels: boolean
media_type: FileContentTypesNullable
split_rows: boolean
cold_storage_params: ColdStorageProps
generate_chunks_only: boolean
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: boolean
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/upload_file_from_url
POST
carbon.files.uploadText
Carbon supports multiple models for use in generating embeddings for files. For images, we support Vertex AI's
multimodal model; for text, we support OpenAI's text-embedding-ada-002
and Cohere's embed-multilingual-v3.0.
The model can be specified via the embedding_model
parameter (in the POST body for /embeddings
, and a query
parameter in /uploadfile
). If no model is supplied, the text-embedding-ada-002
is used by default. When performing
embedding queries, embeddings from files that used the specified model will be considered in the query.
For example, if files A and B have embeddings generated with OPENAI
, and files C and D have embeddings generated with
COHERE_MULTILINGUAL_V3
, then by default, queries will only consider files A and B. If COHERE_MULTILINGUAL_V3
is
specified as the embedding_model
in /embeddings
, then only files C and D will be considered. Make sure that
the set of all files you want considered for a query have embeddings generated via the same model. For now, do not
set VERTEX_MULTIMODAL
as an embedding_model
. This model is used automatically by Carbon when it detects an image file.
🛠️ Usage
const uploadTextResponse = await carbon.files.uploadText({
contents: "contents_example",
skip_embedding_generation: false,
embedding_model: "OPENAI",
generate_sparse_vectors: false,
generate_chunks_only: false,
store_file_only: false,
});
⚙️ Parameters
contents: string
name: string
chunk_size: number
chunk_overlap: number
skip_embedding_generation: boolean
overwrite_file_id: number
embedding_model: EmbeddingGeneratorsNullable
generate_sparse_vectors: boolean
cold_storage_params: ColdStorageProps
generate_chunks_only: boolean
If this flag is enabled, the file will be chunked and stored with Carbon, but no embeddings will be generated. This overrides the skip_embedding_generation flag.
store_file_only: boolean
If this flag is enabled, the file will be stored with Carbon, but no processing will be done.
🔄 Return
🌐 Endpoint
/upload_text
POST
carbon.github.getIssue
Issue
🛠️ Usage
const getIssueResponse = await carbon.github.getIssue({
issueNumber: 1,
includeRemoteData: false,
});
⚙️ Parameters
issueNumber: number
includeRemoteData: boolean
dataSourceId: number
repository: string
🔄 Return
🌐 Endpoint
/integrations/data/github/issues/{issue_number}
GET
carbon.github.getIssues
Issues
🛠️ Usage
const getIssuesResponse = await carbon.github.getIssues({
data_source_id: 1,
include_remote_data: false,
repository: "repository_example",
page: 1,
page_size: 30,
order_by: "created",
order_dir: "asc",
});
⚙️ Parameters
data_source_id: number
repository: string
Full name of the repository, denoted as {owner}/{repo}
include_remote_data: boolean
page: number
page_size: number
next_cursor: string
filters: IssuesFilter
order_by: IssuesOrderBy
order_dir: OrderDirV2Nullable
🔄 Return
🌐 Endpoint
/integrations/data/github/issues
POST
carbon.github.getPr
Get Pr
🛠️ Usage
const getPrResponse = await carbon.github.getPr({
pullNumber: 1,
includeRemoteData: false,
});
⚙️ Parameters
pullNumber: number
includeRemoteData: boolean
dataSourceId: number
repository: string
🔄 Return
🌐 Endpoint
/integrations/data/github/pull_requests/{pull_number}
GET
[🔙 **Back to Table o