API - Files

A collections of helper functions to work with dataset. Load benchmark dataset, save and restore model, save and load variables.

load_mnist_dataset([shape, path])

Load the original mnist.

load_fashion_mnist_dataset([shape, path])

Load the fashion mnist.

load_cifar10_dataset([shape, path, plotable])

Load CIFAR-10 dataset.

load_cropped_svhn([path, include_extra])

Load Cropped SVHN.

load_matt_mahoney_text8_dataset([path])

Load Matt Mahoney’s dataset.

load_imdb_dataset([path, nb_words, …])

Load IMDB dataset.

load_nietzsche_dataset([path])

Load Nietzsche dataset.

load_flickr25k_dataset([tag, path, …])

Load Flickr25K dataset.

load_flickr1M_dataset([tag, size, path, …])

Load Flick1M dataset.

load_cyclegan_dataset([filename, path])

Load images from CycleGAN’s database, see this link.

load_celebA_dataset([path])

Load CelebA dataset

load_mpii_pose_dataset([path, is_16_pos_only])

Load MPII Human Pose Dataset.

download_file_from_google_drive(ID, destination)

Download file from Google Drive.

save_npz([save_list, name])

Input parameters and the file name, save parameters into .npz file.

load_npz([path, name])

Load the parameters of a Model saved by tlx.files.save_npz().

assign_weights(weights, network)

Assign the given parameters to the TensorLayer network.

load_and_assign_npz([name, network])

Load model from npz and assign to a network.

save_npz_dict([save_list, name])

Input parameters and the file name, save parameters as a dictionary into .npz file.

load_and_assign_npz_dict([name, network, skip])

Restore the parameters saved by tlx.files.save_npz_dict().

save_weights_to_hdf5(save_list, filepath)

Input filepath and save weights in hdf5 format.

load_hdf5_to_weights_in_order(filepath, network)

Load weights sequentially from a given file of hdf5 format

load_hdf5_to_weights(filepath, network[, skip])

Load weights by name from a given file of hdf5 format

save_any_to_npy([save_dict, name])

Save variables to .npy file.

load_npy_to_any([path, name])

Load .npy file.

file_exists(filepath)

Check whether a file exists by given file path.

folder_exists(folderpath)

Check whether a folder exists by given folder path.

del_file(filepath)

Delete a file by given file path.

del_folder(folderpath)

Delete a folder by given folder path.

read_file(filepath)

Read a file and return a string.

load_file_list([path, regx, printable, …])

Return a file list in a folder by given a path and regular expression.

load_folder_list([path])

Return a folder list in a folder by given a folder path.

exists_or_mkdir(path[, verbose])

Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True.

maybe_download_and_extract(filename, …[, …])

Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar”

natural_keys(text)

Sort list of string with number in human order.

Load dataset functions

MNIST

tensorlayerx.files.load_mnist_dataset(shape=(-1, 784), path='data')[source]

Load the original mnist.

Automatically download MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 digit images respectively.

Parameters
  • shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).

  • path (str) – The path that the data is downloaded to.

Returns

X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.

Return type

tuple

Examples

>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1,784), path='datasets')
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1, 28, 28, 1))

Fashion-MNIST

tensorlayerx.files.load_fashion_mnist_dataset(shape=(-1, 784), path='data')[source]

Load the fashion mnist.

Automatically download fashion-MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 fashion images respectively, examples.

Parameters
  • shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).

  • path (str) – The path that the data is downloaded to.

Returns

X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.

Return type

tuple

Examples

>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1,784), path='datasets')
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1, 28, 28, 1))

CIFAR-10

tensorlayerx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3), path='data', plotable=False)[source]

Load CIFAR-10 dataset.

It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Parameters
  • shape (tupe) – The shape of digit images e.g. (-1, 3, 32, 32) and (-1, 32, 32, 3).

  • path (str) – The path that the data is downloaded to, defaults is data/cifar10/.

  • plotable (boolean) – Whether to plot some image examples, False as default.

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3))

References

SVHN

tensorlayerx.files.load_cropped_svhn(path='data', include_extra=True)[source]

Load Cropped SVHN.

The Cropped Street View House Numbers (SVHN) Dataset contains 32x32x3 RGB images. Digit ‘1’ has label 1, ‘9’ has label 9 and ‘0’ has label 0 (the original dataset uses 10 to represent ‘0’), see ufldl website.

Parameters
  • path (str) – The path that the data is downloaded to.

  • include_extra (boolean) – If True (default), add extra images to the training set.

Returns

X_train, y_train, X_test, y_test – Return splitted training/test set respectively.

Return type

tuple

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_cropped_svhn(include_extra=False)
>>> tlx.vis.save_images(X_train[0:100], [10, 10], 'svhn.png')

Matt Mahoney’s text8

tensorlayerx.files.load_matt_mahoney_text8_dataset(path='data')[source]

Load Matt Mahoney’s dataset.

Download a text file from Matt Mahoney’s website if not present, and make sure it’s the right size. Extract the first file enclosed in a zip file as a list of words. This dataset can be used for Word Embedding.

Parameters

path (str) – The path that the data is downloaded to, defaults is data/mm_test8/.

Returns

The raw text data e.g. […. ‘their’, ‘families’, ‘who’, ‘were’, ‘expelled’, ‘from’, ‘jerusalem’, …]

Return type

list of str

Examples

>>> words = tlx.files.load_matt_mahoney_text8_dataset()
>>> print('Data size', len(words))

IMBD

tensorlayerx.files.load_imdb_dataset(path='data', nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113, start_char=1, oov_char=2, index_from=3)[source]

Load IMDB dataset.

Parameters
  • path (str) – The path that the data is downloaded to, defaults is data/imdb/.

  • nb_words (int) – Number of words to get.

  • skip_top (int) – Top most frequent words to ignore (they will appear as oov_char value in the sequence data).

  • maxlen (int) – Maximum sequence length. Any longer sequence will be truncated.

  • seed (int) – Seed for reproducible data shuffling.

  • start_char (int) – The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.

  • oov_char (int) – Words that were cut out because of the num_words or skip_top limit will be replaced with this character.

  • index_from (int) – Index actual words with this index and higher.

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_imdb_dataset(
...                                 nb_words=20000, test_split=0.2)
>>> print('X_train.shape', X_train.shape)
(20000,)  [[1, 62, 74, ... 1033, 507, 27],[1, 60, 33, ... 13, 1053, 7]..]
>>> print('y_train.shape', y_train.shape)
(20000,)  [1 0 0 ..., 1 0 1]

References

Nietzsche

tensorlayerx.files.load_nietzsche_dataset(path='data')[source]

Load Nietzsche dataset.

Parameters

path (str) – The path that the data is downloaded to, defaults is data/nietzsche/.

Returns

The content.

Return type

str

Examples

>>> see tutorial_generate_text.py
>>> words = tlx.files.load_nietzsche_dataset()
>>> words = basic_clean_str(words)
>>> words = words.split()

Flickr25k

tensorlayerx.files.load_flickr25k_dataset(tag='sky', path='data', n_threads=50, printable=False)[source]

Load Flickr25K dataset.

Returns a list of images by a given tag from Flick25k dataset, it will download Flickr25k from the official website at the first time you use it.

Parameters
  • tag (str or None) –

    What images to return.
    • If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.

    • If you want to get all images, set to None.

  • path (str) – The path that the data is downloaded to, defaults is data/flickr25k/.

  • n_threads (int) – The number of thread to read image.

  • printable (boolean) – Whether to print infomation when reading images, default is False.

Examples

Get images with tag of sky

>>> images = tlx.files.load_flickr25k_dataset(tag='sky')

Get all images

>>> images = tlx.files.load_flickr25k_dataset(tag=None, n_threads=100, printable=True)

Flickr1M

tensorlayerx.files.load_flickr1M_dataset(tag='sky', size=10, path='data', n_threads=50, printable=False)[source]

Load Flick1M dataset.

Returns a list of images by a given tag from Flickr1M dataset, it will download Flickr1M from the official website at the first time you use it.

Parameters
  • tag (str or None) –

    What images to return.
    • If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.

    • If you want to get all images, set to None.

  • size (int) – integer between 1 to 10. 1 means 100k images … 5 means 500k images, 10 means all 1 million images. Default is 10.

  • path (str) – The path that the data is downloaded to, defaults is data/flickr25k/.

  • n_threads (int) – The number of thread to read image.

  • printable (boolean) – Whether to print infomation when reading images, default is False.

Examples

Use 200k images

>>> images = tlx.files.load_flickr1M_dataset(tag='zebra', size=2)

Use 1 Million images

>>> images = tlx.files.load_flickr1M_dataset(tag='zebra')

CycleGAN

tensorlayerx.files.load_cyclegan_dataset(filename='summer2winter_yosemite', path='data')[source]

Load images from CycleGAN’s database, see this link.

Parameters
  • filename (str) – The dataset you want, see this link.

  • path (str) – The path that the data is downloaded to, defaults is data/cyclegan

Examples

>>> im_train_A, im_train_B, im_test_A, im_test_B = load_cyclegan_dataset(filename='summer2winter_yosemite')

CelebA

tensorlayerx.files.load_celebA_dataset(path='data')[source]

Load CelebA dataset

Return a list of image path.

Parameters

path (str) – The path that the data is downloaded to, defaults is data/celebA/.

MPII

tensorlayerx.files.load_mpii_pose_dataset(path='data', is_16_pos_only=False)[source]

Load MPII Human Pose Dataset.

Parameters
  • path (str) – The path that the data is downloaded to.

  • is_16_pos_only (boolean) – If True, only return the peoples contain 16 pose keypoints. (Usually be used for single person pose estimation)

Returns

  • img_train_list (list of str) – The image directories of training data.

  • ann_train_list (list of dict) – The annotations of training data.

  • img_test_list (list of str) – The image directories of testing data.

  • ann_test_list (list of dict) – The annotations of testing data.

Examples

>>> import pprint
>>> import tensorlayerx as tlx
>>> img_train_list, ann_train_list, img_test_list, ann_test_list = tlx.files.load_mpii_pose_dataset()
>>> image = tlx.vis.read_image(img_train_list[0])
>>> tlx.vis.draw_mpii_pose_to_image(image, ann_train_list[0], 'image.png')
>>> pprint.pprint(ann_train_list[0])

References

Google Drive

tensorlayerx.files.download_file_from_google_drive(ID, destination)[source]

Download file from Google Drive.

See tlx.files.load_celebA_dataset for example.

Parameters
  • ID (str) – The driver ID.

  • destination (str) – The destination for save file.

Load and save network

TensorFlow provides .ckpt file format to save and restore the models, while we suggest to use standard python file format hdf5 to save models for the sake of cross-platform. Other file formats such as .npz are also available.

## save model as .h5
tlx.files.save_weights_to_hdf5('model.h5', network.all_weights)
# restore model from .h5 (in order)
tlx.files.load_hdf5_to_weights_in_order('model.h5', network.all_weights)
# restore model from .h5 (by name)
tlx.files.load_hdf5_to_weights('model.h5', network.all_weights)

## save model as .npz
tlx.files.save_npz(network.all_weights , name='model.npz')
# restore model from .npz (method 1)
load_params = tlx.files.load_npz(name='model.npz')
tlx.files.assign_weights(sess, load_params, network)
# restore model from .npz (method 2)
tlx.files.load_and_assign_npz(sess=sess, name='model.npz', network=network)

## you can assign the pre-trained parameters as follow
# 1st parameter
tlx.files.assign_weights(sess, [load_params[0]], network)
# the first three parameters
tlx.files.assign_weights(sess, load_params[:3], network)

Save network into list (npz)

tensorlayerx.files.save_npz(save_list=None, name='model.npz')[source]

Input parameters and the file name, save parameters into .npz file. Use tlx.utils.load_npz() to restore.

Parameters
  • save_list (list of tensor) – A list of parameters (tensor) to be saved.

  • name (str) – The name of the .npz file.

Examples

Save model to npz

>>> tlx.files.save_npz(network.all_weights, name='model.npz')

Load model from npz (Method 1)

>>> load_params = tlx.files.load_npz(name='model.npz')
>>> tlx.files.assign_weights(load_params, network)

Load model from npz (Method 2)

>>> tlx.files.load_and_assign_npz(name='model.npz', network=network)

References

Saving dictionary using numpy

Load network from list (npz)

tensorlayerx.files.load_npz(path='', name='model.npz')[source]

Load the parameters of a Model saved by tlx.files.save_npz().

Parameters
  • path (str) – Folder path to .npz file.

  • name (str) – The name of the .npz file.

Returns

A list of parameters in order.

Return type

list of array

Examples

  • See tlx.files.save_npz

References

Assign a list of parameters to network

tensorlayerx.files.assign_weights(weights, network)[source]

Assign the given parameters to the TensorLayer network.

Parameters
  • weights (list of array) – A list of model weights (array) in order.

  • network (Layer) – The network to be assigned.

Returns

  • 1) list of operations if in graph mode – A list of tf ops in order that assign weights. Support sess.run(ops) manually.

  • 2) list of tf variables if in eager mode – A list of tf variables (assigned weights) in order.

Examples

References

Load and assign a list of parameters to network

tensorlayerx.files.load_and_assign_npz(name=None, network=None)[source]

Load model from npz and assign to a network.

Parameters
  • name (str) – The name of the .npz file.

  • network (Model) – The network to be assigned.

Examples

  • See tlx.files.save_npz

Save network into dict (npz)

tensorlayerx.files.save_npz_dict(save_list=None, name='model.npz')[source]

Input parameters and the file name, save parameters as a dictionary into .npz file.

Use tlx.files.load_and_assign_npz_dict() to restore.

Parameters
  • save_list (list of parameters) – A list of parameters (tensor) to be saved.

  • name (str) – The name of the .npz file.

Load network from dict (npz)

tensorlayerx.files.load_and_assign_npz_dict(name='model.npz', network=None, skip=False)[source]

Restore the parameters saved by tlx.files.save_npz_dict().

Parameters
  • name (str) – The name of the .npz file.

  • network (Model) – The network to be assigned.

  • skip (boolean) – If ‘skip’ == True, loaded weights whose name is not found in network’s weights will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.

Save network into OrderedDict (hdf5)

tensorlayerx.files.save_weights_to_hdf5(save_list, filepath)[source]

Input filepath and save weights in hdf5 format.

Parameters
  • filepath (str) – Filename to which the weights will be saved.

  • network (Model) – TL model.

Load network from hdf5 in order

tensorlayerx.files.load_hdf5_to_weights_in_order(filepath, network, skip=False)[source]

Load weights sequentially from a given file of hdf5 format

Parameters
  • filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.

  • network (Model) – TL model.

  • Notes – If the file contains more weights than given ‘weights’, then the redundant ones will be ignored if all previous weights match perfectly.

Load network from hdf5 by name

tensorlayerx.files.load_hdf5_to_weights(filepath, network, skip=False)[source]

Load weights by name from a given file of hdf5 format

Parameters
  • filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.

  • network (Model) – TL model.

  • skip (bool) – If ‘skip’ == True, loaded weights whose name is not found in ‘weights’ will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.

Load and save variables

Save variables as .npy

tensorlayerx.files.save_any_to_npy(save_dict=None, name='file.npy')[source]

Save variables to .npy file.

Parameters
  • save_dict (directory) – The variables to be saved.

  • name (str) – File name.

Examples

>>> tlx.files.save_any_to_npy(save_dict={'data': ['a','b']}, name='test.npy')
>>> data = tlx.files.load_npy_to_any(name='test.npy')
>>> print(data)
{'data': ['a','b']}

Load variables from .npy

tensorlayerx.files.load_npy_to_any(path='', name='file.npy')[source]

Load .npy file.

Parameters
  • path (str) – Path to the file (optional).

  • name (str) – File name.

Examples

  • see tlx.files.save_any_to_npy()

Folder/File functions

Check file exists

tensorlayerx.files.file_exists(filepath)[source]

Check whether a file exists by given file path.

Check folder exists

tensorlayerx.files.folder_exists(folderpath)[source]

Check whether a folder exists by given folder path.

Delete file

tensorlayerx.files.del_file(filepath)[source]

Delete a file by given file path.

Delete folder

tensorlayerx.files.del_folder(folderpath)[source]

Delete a folder by given folder path.

Read file

tensorlayerx.files.read_file(filepath)[source]

Read a file and return a string.

Examples

>>> data = tlx.files.read_file('data.txt')

Load file list from folder

tensorlayerx.files.load_file_list(path=None, regx='\\.jpg', printable=True, keep_prefix=False)[source]

Return a file list in a folder by given a path and regular expression.

Parameters
  • path (str or None) – A folder path, if None, use the current directory.

  • regx (str) – The regx of file name.

  • printable (boolean) – Whether to print the files infomation.

  • keep_prefix (boolean) – Whether to keep path in the file name.

Examples

>>> file_list = tlx.files.load_file_list(path=None, regx='w1pre_[0-9]+\.(npz)')

Load folder list from folder

tensorlayerx.files.load_folder_list(path='')[source]

Return a folder list in a folder by given a folder path.

Parameters

path (str) – A folder path.

Check and Create folder

tensorlayerx.files.exists_or_mkdir(path, verbose=True)[source]

Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True.

Parameters
  • path (str) – A folder path.

  • verbose (boolean) – If True (default), prints results.

Returns

True if folder already exist, otherwise, returns False and create the folder.

Return type

boolean

Examples

>>> tlx.files.exists_or_mkdir("checkpoints/train")

Download or extract

tensorlayerx.files.maybe_download_and_extract(filename, working_directory, url_source, extract=False, expected_bytes=None)[source]

Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar”

Parameters
  • filename (str) – The name of the (to be) dowloaded file.

  • working_directory (str) – A folder path to search for the file in and dowload the file to

  • url (str) – The URL to download the file from

  • extract (boolean) – If True, tries to uncompress the dowloaded file is “.tar.gz/.tar.bz2” or “.zip” file, default is False.

  • expected_bytes (int or None) – If set tries to verify that the downloaded file is of the specified size, otherwise raises an Exception, defaults is None which corresponds to no check being performed.

Returns

File path of the dowloaded (uncompressed) file.

Return type

str

Examples

>>> down_file = tlx.files.maybe_download_and_extract(filename='train-images-idx3-ubyte.gz',
...                                            working_directory='data/',
...                                            url_source='http://yann.lecun.com/exdb/mnist/')
>>> tlx.files.maybe_download_and_extract(filename='ADEChallengeData2016.zip',
...                                             working_directory='data/',
...                                             url_source='http://sceneparsing.csail.mit.edu/data/',
...                                             extract=True)

Sort

List of string with number in human order

tensorlayerx.files.natural_keys(text)[source]

Sort list of string with number in human order.

Examples

>>> l = ['im1.jpg', 'im31.jpg', 'im11.jpg', 'im21.jpg', 'im03.jpg', 'im05.jpg']
>>> l.sort(key=tlx.files.natural_keys)
['im1.jpg', 'im03.jpg', 'im05', 'im11.jpg', 'im21.jpg', 'im31.jpg']
>>> l.sort() # that is what we dont want
['im03.jpg', 'im05', 'im1.jpg', 'im11.jpg', 'im21.jpg', 'im31.jpg']

References

Visualizing npz file

tensorlayerx.files.npz_to_W_pdf(path=None, regx='w1pre_[0-9]+\\.(npz)')[source]

Convert the first weight matrix of .npz file to .pdf by using tlx.visualize.W().

Parameters
  • path (str) – A folder path to npz files.

  • regx (str) – Regx for the file name.

Examples

Convert the first weight matrix of w1_pre…npz file to w1_pre…pdf.

>>> tlx.files.npz_to_W_pdf(path='/Users/.../npz_file/', regx='w1pre_[0-9]+\.(npz)')