Working with HDF5 files in python.

Ranjit Kumar Pattnayak
2 min readOct 26, 2019

The use of HDF5 (Hierarchical Data Format Version 5) is rapidly increasing in Data Science. HDF5 files are becoming a more prevalent way of storing large datasets.

How large you ask? These days it's common to work with datasets consuming hundreds of gigabytes or terabytes in size and HDF5 can scale up to exabytes.

Here we will be discussing how to process HDF5 files, how to create them, how to save them etc.

HDF5 File Structure:

Its structure is similar to a file system directory tree. There are basically three basic types of items in HDF5 files File, Group and Dataset and their names are used as access keys.

Creating an HDF5 File:

Let’s create an HDF5 file now. So we do the imports. We will import numpy, then we import the package h5py. Now we will create two random matrices using numpy matrix1 and matrix2 as shown below:

So here we have created to datasets matrix1 and matrix2 inside one file i.e hdf5_data file.

Reading HDF5 files:

We can read the data in a very simpler way. We can use the hdf.keys() function to know all the keys in the dataset. We open the file with a read attribute, r, and we recover the data by directly addressing the dataset called dataset1.

Reading HDF5 file.

Sign up to discover human stories that deepen your understanding of the world.

Free

Distraction-free reading. No ads.

Organize your knowledge with lists and highlights.

Tell your story. Find your audience.

Membership

Read member-only stories

Support writers you read most

Earn money for your writing

Listen to audio narrations

Read offline with the Medium app

Ranjit Kumar Pattnayak
Ranjit Kumar Pattnayak

No responses yet

Write a response