Compressions

Deep Lake can read, compress, decompress and recompress data to different formats. The supported htype-compression configurations are given below.

Sample Type Htype Compressions
Image image
bmp, dib, gif, ico,
jpeg, jpeg2000, pcx,
png, ppm, sgi, tga,
tiff, webp, wmf, xbm,
eps, fli, im, msp,
mpo, apng
Video video mp4, mkv, avi
Audio audio flac, mp3, wav
Dicom dicom dcm
Point Cloud point_cloud las
Mesh mesh ply
Other bbox, text, list, json, generic, etc. lz4

Sample Compression

If sample compression is specified when creating tensors, samples will be compressed to the given format if possible. If given data is already compressed and matches the provided sample_compression, it will be stored as is. If left as None, given samples are uncompressed.

Note

For audio and video, we don’t support compressing raw frames but only reading compressed audio and video data.

Examples:

>>> ds.create_tensor("images", htype="image", sample_compression="jpg")

>>> ds.create_tensor("videos", htype="video", sample_compression="mp4")

>>> ds.create_tensor("point_clouds", htype="point_cloud", sample_compression="las")
_images/sample_compression.svg

Structure of sample-wise compressed tensor.

Chunk Compression

If chunk compression is specified when creating tensors, addded samples will be clubbed together and compressed to the given format chunk-wise. If given data is already compressed, it will be uncompressed and then recompressed chunk-wise.

Note

Chunk-wise compression is not supported for audio, video and point_cloud htypes.

Examples:

>>> ds.create_tensor("images", htype="image", chunk_compression="jpg")

>>> ds.create_tensor("boxes", htype="bbox", chunk_compression="lz4")
_images/chunk_compression.svg

Structure of chunk-wise compressed tensor.

Note

See deeplake.read() to learn how to read data from files and populate these tensors.