How can I upload larger files? How can I upload faster?
You may (sometimes have to) upload files in parts. This means instead of opening a single HTTP connection to transfer the whole binary to the REST API, you open multiple connections. Each HTTP connection ships part of the file.
Using multipart uploads enables the following features.
Increasing the upload speed by uploading the parts concurrently.
Pausing the upload.
Multipart upload is a must for uploads which require HTTP connection longer than the REST API timeouts. Multipart upload is recommended for uploading files that are larger than 5 MB. That size also constitutes the minimum size of the upload part (the last chunk may be smaller). The maximum number of chunks is 10,000. The sizes of chunks must be equal (the exception is the last chunk which may be smaller). The chunk size is defined with the first upload chunk, based on Content-Range header.
Example 3. Multipart upload
// Get total file size
const stats = await fs.promises.stat(FILE_PATH);
const fileSize = stats.size;
// Define function for uploading single chunk from the file
const uploadChunk = async (start: number, end: number, headers: HeadersInit = {}) => {
const chunkStream = fs.createReadStream('./path/to/file', { start: start, end: end });
const chunk = await arrayBuffer(chunkStream)
const response = await fetch(`https://api.akord.com/files?tags=${tagsBase64Encoded}`, {
method: 'POST',
headers: {
'Api-Key': 'your_api_key',
'Content-Type': 'application/pdf',
'Content-Range': `bytes ${start}-${end}/${fileSize}`,
...headers
},
body: chunk
});
if (response.status !== 202) {
throw new Error('Failed to upload first chunk of the file. Status code: ' + response.status);
}
return response;
}
// Upload first chunk of the file
const response = await uploadChunk(0, CHUNK_SIZE);
// Read location of the multipart upload
const contentLocation = response.headers.get('Content-Location');
if (!contentLocation) {
throw new Error('Content-Location header is missing');
}
// Upload middle chunks of the file using 'Content-Location' & 'Content-Range' - can be done concurrently
let sourceOffset = CHUNK_SIZE;
const chunkUploadPromises = [];
while (sourceOffset + CHUNK_SIZE < fileSize) {
const chunkUploadPromise = uploadChunk(sourceOffset, sourceOffset + CHUNK_SIZE, { 'Content-Location': contentLocation });
chunkUploadPromises.push(chunkUploadPromise);
sourceOffset += CHUNK_SIZE;
}
await Promise.all(chunkUploadPromises);
// Upload last chunk of the file to complete the multipart upload
const res = await uploadChunk(sourceOffset, fileSize, { 'Content-Location': contentLocation });
import os
import requests
# Get total file size
file_path = './tests/data/20mb.pdf'
file_size = os.path.getsize(file_path)
# Define function to upload a chunk
def upload_chunk(start, end, headers=None):
with open(file_path, 'rb') as file:
file.seek(start)
chunk = file.read(end - start + 1)
headers = headers or {}
headers['Content-Range'] = f'bytes {start}-{end}/{file_size}'
headers['Content-Type'] = 'application/pdf'
headers['Authorization'] = 'Bearer <your_access_token>'
response = requests.post(f'{os.environ["BASE_URL"]}/files?tags={tags_base64_encoded}',
data=chunk,
headers=headers)
print("Uploaded chunk:", f'bytes {start}-{end}/{file_size}')
if response.status_code != 202:
raise ValueError('Failed to upload chunk of the file. Status code: ' + str(response.status_code))
return response
# Upload first chunk of the file
response = upload_chunk(0, CHUNK_SIZE)
# Read location of the multipart upload
content_location = response.headers.get('Content-Location')
if not content_location:
raise ValueError('Content-Location header is missing')
# Upload middle chunks of the file using 'Content-Location' & 'Content-Range' - can be done concurrently
source_offset = CHUNK_SIZE
while source_offset + CHUNK_SIZE < file_size:
upload_chunk(source_offset, source_offset + CHUNK_SIZE, {'Content-Location': content_location})
source_offset += CHUNK_SIZE
# Upload last chunk of the file to complete the multipart upload
upload_chunk(source_offset, file_size, {'Content-Location': content_location})