API - Files¶

A collections of helper functions to work with dataset. Load benchmark dataset, save and restore model, save and load variables.

`load_mnist_dataset`([shape, path])	Load the original mnist.
`load_fashion_mnist_dataset`([shape, path])	Load the fashion mnist.
`load_cifar10_dataset`([shape, path, plotable])	Load CIFAR-10 dataset.
`load_cropped_svhn`([path, include_extra])	Load Cropped SVHN.
`load_matt_mahoney_text8_dataset`([path])	Load Matt Mahoney’s dataset.
`load_imdb_dataset`([path, nb_words, …])	Load IMDB dataset.
`load_nietzsche_dataset`([path])	Load Nietzsche dataset.
`load_flickr25k_dataset`([tag, path, …])	Load Flickr25K dataset.
`load_flickr1M_dataset`([tag, size, path, …])	Load Flick1M dataset.
`load_cyclegan_dataset`([filename, path])	Load images from CycleGAN’s database, see this link.
`load_celebA_dataset`([path])	Load CelebA dataset
`load_mpii_pose_dataset`([path, is_16_pos_only])	Load MPII Human Pose Dataset.
`download_file_from_google_drive`(ID, destination)	Download file from Google Drive.
`save_npz`([save_list, name])	Input parameters and the file name, save parameters into .npz file.
`load_npz`([path, name])	Load the parameters of a Model saved by tlx.files.save_npz().
`assign_weights`(weights, network)	Assign the given parameters to the TensorLayer network.
`load_and_assign_npz`([name, network])	Load model from npz and assign to a network.
`save_npz_dict`([save_list, name])	Input parameters and the file name, save parameters as a dictionary into .npz file.
`load_and_assign_npz_dict`([name, network, skip])	Restore the parameters saved by `tlx.files.save_npz_dict()`.
`save_weights_to_hdf5`(save_list, filepath)	Input filepath and save weights in hdf5 format.
`load_hdf5_to_weights_in_order`(filepath, network)	Load weights sequentially from a given file of hdf5 format
`load_hdf5_to_weights`(filepath, network[, skip])	Load weights by name from a given file of hdf5 format
`save_any_to_npy`([save_dict, name])	Save variables to .npy file.
`load_npy_to_any`([path, name])	Load .npy file.
`file_exists`(filepath)	Check whether a file exists by given file path.
`folder_exists`(folderpath)	Check whether a folder exists by given folder path.
`del_file`(filepath)	Delete a file by given file path.
`del_folder`(folderpath)	Delete a folder by given folder path.
`read_file`(filepath)	Read a file and return a string.
`load_file_list`([path, regx, printable, …])	Return a file list in a folder by given a path and regular expression.
`load_folder_list`([path])	Return a folder list in a folder by given a folder path.
`exists_or_mkdir`(path[, verbose])	Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True.
`maybe_download_and_extract`(filename, …[, …])	Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar”
`natural_keys`(text)	Sort list of string with number in human order.

Load dataset functions¶

MNIST¶

tensorlayerx.files.load_mnist_dataset(shape=(-1, 784), path='data')[source]¶

Load the original mnist.

Automatically download MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 digit images respectively.

Parameters:

shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).
path (str) – The path that the data is downloaded to.

Returns:

X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.

Return type:

tuple

Examples

>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1,784), path='datasets')
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1, 28, 28, 1))

Fashion-MNIST¶

tensorlayerx.files.load_fashion_mnist_dataset(shape=(-1, 784), path='data')[source]¶

Load the fashion mnist.

Automatically download fashion-MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 fashion images respectively, examples.

Parameters:

shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).
path (str) – The path that the data is downloaded to.

Returns:

X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.

Return type:

tuple

Examples

>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1,784), path='datasets')
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1, 28, 28, 1))

CIFAR-10¶

tensorlayerx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3), path='data', plotable=False)[source]¶

Load CIFAR-10 dataset.

It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.

The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.

Parameters:

shape (tupe) – The shape of digit images e.g. (-1, 3, 32, 32) and (-1, 32, 32, 3).
path (str) – The path that the data is downloaded to, defaults is data/cifar10/.
plotable (boolean) – Whether to plot some image examples, False as default.

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3))

References

SVHN¶

tensorlayerx.files.load_cropped_svhn(path='data', include_extra=True)[source]¶

Load Cropped SVHN.

The Cropped Street View House Numbers (SVHN) Dataset contains 32x32x3 RGB images. Digit ‘1’ has label 1, ‘9’ has label 9 and ‘0’ has label 0 (the original dataset uses 10 to represent ‘0’), see ufldl website.

Parameters:

path (str) – The path that the data is downloaded to.
include_extra (boolean) – If True (default), add extra images to the training set.

Returns:

X_train, y_train, X_test, y_test – Return splitted training/test set respectively.

Return type:

tuple

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_cropped_svhn(include_extra=False)
>>> tlx.vis.save_images(X_train[0:100], [10, 10], 'svhn.png')

Matt Mahoney’s text8¶

tensorlayerx.files.load_matt_mahoney_text8_dataset(path='data')[source]¶

Load Matt Mahoney’s dataset.

Download a text file from Matt Mahoney’s website if not present, and make sure it’s the right size. Extract the first file enclosed in a zip file as a list of words. This dataset can be used for Word Embedding.

Parameters:: path (str) – The path that the data is downloaded to, defaults is data/mm_test8/.
Returns:: The raw text data e.g. […. ‘their’, ‘families’, ‘who’, ‘were’, ‘expelled’, ‘from’, ‘jerusalem’, …]
Return type:: list of str

Examples

>>> words = tlx.files.load_matt_mahoney_text8_dataset()
>>> print('Data size', len(words))

IMBD¶

tensorlayerx.files.load_imdb_dataset(path='data', nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113, start_char=1, oov_char=2, index_from=3)[source]¶

Load IMDB dataset.

Parameters:

path (str) – The path that the data is downloaded to, defaults is data/imdb/.
nb_words (int) – Number of words to get.
skip_top (int) – Top most frequent words to ignore (they will appear as oov_char value in the sequence data).
maxlen (int) – Maximum sequence length. Any longer sequence will be truncated.
seed (int) – Seed for reproducible data shuffling.
start_char (int) – The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
oov_char (int) – Words that were cut out because of the num_words or skip_top limit will be replaced with this character.
index_from (int) – Index actual words with this index and higher.

Examples

>>> X_train, y_train, X_test, y_test = tlx.files.load_imdb_dataset(
...                                 nb_words=20000, test_split=0.2)
>>> print('X_train.shape', X_train.shape)
(20000,)  [[1, 62, 74, ... 1033, 507, 27],[1, 60, 33, ... 13, 1053, 7]..]
>>> print('y_train.shape', y_train.shape)
(20000,)  [1 0 0 ..., 1 0 1]

References

Modified from keras.

Nietzsche¶

tensorlayerx.files.load_nietzsche_dataset(path='data')[source]¶

Load Nietzsche dataset.

Parameters:: path (str) – The path that the data is downloaded to, defaults is data/nietzsche/.
Returns:: The content.
Return type:: str

Examples

>>> see tutorial_generate_text.py
>>> words = tlx.files.load_nietzsche_dataset()
>>> words = basic_clean_str(words)
>>> words = words.split()

Flickr25k¶

tensorlayerx.files.load_flickr25k_dataset(tag='sky', path='data', n_threads=50, printable=False)[source]¶

Load Flickr25K dataset.

Returns a list of images by a given tag from Flick25k dataset, it will download Flickr25k from the official website at the first time you use it.

Parameters:

tag (str or None) –
What images to return.
- If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.
- If you want to get all images, set to None.
path (str) – The path that the data is downloaded to, defaults is data/flickr25k/.
n_threads (int) – The number of thread to read image.
printable (boolean) – Whether to print infomation when reading images, default is False.

Examples

Get images with tag of sky

>>> images = tlx.files.load_flickr25k_dataset(tag='sky')

Get all images

>>> images = tlx.files.load_flickr25k_dataset(tag=None, n_threads=100, printable=True)

Flickr1M¶

tensorlayerx.files.load_flickr1M_dataset(tag='sky', size=10, path='data', n_threads=50, printable=False)[source]¶

Load Flick1M dataset.

Returns a list of images by a given tag from Flickr1M dataset, it will download Flickr1M from the official website at the first time you use it.

Parameters:

tag (str or None) –
What images to return.
- If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.
- If you want to get all images, set to None.
size (int) – integer between 1 to 10. 1 means 100k images … 5 means 500k images, 10 means all 1 million images. Default is 10.
path (str) – The path that the data is downloaded to, defaults is data/flickr1M/.
n_threads (int) – The number of thread to read image.
printable (boolean) – Whether to print infomation when reading images, default is False.

Examples

Use 200k images

>>> images = tlx.files.load_flickr1M_dataset(tag='zebra', size=2)

Use 1 Million images

>>> images = tlx.files.load_flickr1M_dataset(tag='zebra')

CycleGAN¶

tensorlayerx.files.load_cyclegan_dataset(filename='summer2winter_yosemite', path='data')[source]¶

Load images from CycleGAN’s database, see this link.

Parameters:

filename (str) – The dataset you want, see this link.
path (str) – The path that the data is downloaded to, defaults is data/cyclegan

Examples

>>> im_train_A, im_train_B, im_test_A, im_test_B = load_cyclegan_dataset(filename='summer2winter_yosemite')

CelebA¶

tensorlayerx.files.load_celebA_dataset(path='data')[source]¶

Load CelebA dataset

Return a list of image path.

Parameters:: path (str) – The path that the data is downloaded to, defaults is data/celebA/.

MPII¶

tensorlayerx.files.load_mpii_pose_dataset(path='data', is_16_pos_only=False)[source]¶

Load MPII Human Pose Dataset.

Parameters:

path (str) – The path that the data is downloaded to.
is_16_pos_only (boolean) – If True, only return the peoples contain 16 pose keypoints. (Usually be used for single person pose estimation)

Returns:

img_train_list (list of str) – The image directories of training data.
ann_train_list (list of dict) – The annotations of training data.
img_test_list (list of str) – The image directories of testing data.
ann_test_list (list of dict) – The annotations of testing data.

Examples

>>> import pprint
>>> import tensorlayerx as tlx
>>> img_train_list, ann_train_list, img_test_list, ann_test_list = tlx.files.load_mpii_pose_dataset()
>>> image = tlx.vis.read_image(img_train_list[0])
>>> tlx.vis.draw_mpii_pose_to_image(image, ann_train_list[0], 'image.png')
>>> pprint.pprint(ann_train_list[0])

References

Google Drive¶

tensorlayerx.files.download_file_from_google_drive(ID, destination)[source]¶

Download file from Google Drive.

See tlx.files.load_celebA_dataset for example.

Parameters:

ID (str) – The driver ID.
destination (str) – The destination for save file.

Load and save network¶

TensorFlow provides .ckpt file format to save and restore the models, while we suggest to use standard python file format hdf5 to save models for the sake of cross-platform. Other file formats such as .npz are also available.

## save model as .h5
tlx.files.save_weights_to_hdf5('model.h5', network.all_weights)
# restore model from .h5 (in order)
tlx.files.load_hdf5_to_weights_in_order('model.h5', network.all_weights)
# restore model from .h5 (by name)
tlx.files.load_hdf5_to_weights('model.h5', network.all_weights)

## save model as .npz
tlx.files.save_npz(network.all_weights , name='model.npz')
# restore model from .npz (method 1)
load_params = tlx.files.load_npz(name='model.npz')
tlx.files.assign_weights(sess, load_params, network)
# restore model from .npz (method 2)
tlx.files.load_and_assign_npz(sess=sess, name='model.npz', network=network)

## you can assign the pre-trained parameters as follow
# 1st parameter
tlx.files.assign_weights(sess, [load_params[0]], network)
# the first three parameters
tlx.files.assign_weights(sess, load_params[:3], network)

Save network into list (npz)¶

tensorlayerx.files.save_npz(save_list=None, name='model.npz')[source]¶

Input parameters and the file name, save parameters into .npz file. Use tlx.utils.load_npz() to restore.

Parameters:

save_list (list of tensor) – A list of parameters (tensor) to be saved.
name (str) – The name of the .npz file.

Examples

Save model to npz

>>> tlx.files.save_npz(network.all_weights, name='model.npz')

Load model from npz (Method 1)

>>> load_params = tlx.files.load_npz(name='model.npz')
>>> tlx.files.assign_weights(load_params, network)

Load model from npz (Method 2)

>>> tlx.files.load_and_assign_npz(name='model.npz', network=network)

References

Saving dictionary using numpy

Load network from list (npz)¶

tensorlayerx.files.load_npz(path='', name='model.npz')[source]¶

Load the parameters of a Model saved by tlx.files.save_npz().

Parameters:

path (str) – Folder path to .npz file.
name (str) – The name of the .npz file.

Returns:

A list of parameters in order.

Return type:

list of array

Examples

See tlx.files.save_npz

References

Saving dictionary using numpy

Assign a list of parameters to network¶

tensorlayerx.files.assign_weights(weights, network)[source]¶

Assign the given parameters to the TensorLayer network.

Parameters:

weights (list of array) – A list of model weights (array) in order.
network (Layer) – The network to be assigned.

Returns:

1) list of operations if in graph mode – A list of tf ops in order that assign weights. Support sess.run(ops) manually.
2) list of tf variables if in eager mode – A list of tf variables (assigned weights) in order.

Examples

References

Assign value to a TensorFlow variable

Load and assign a list of parameters to network¶

tensorlayerx.files.load_and_assign_npz(name=None, network=None)[source]¶

Load model from npz and assign to a network.

Parameters:

name (str) – The name of the .npz file.
network (Model) – The network to be assigned.

Examples

See tlx.files.save_npz

Save network into dict (npz)¶

tensorlayerx.files.save_npz_dict(save_list=None, name='model.npz')[source]¶

Input parameters and the file name, save parameters as a dictionary into .npz file.

Use tlx.files.load_and_assign_npz_dict() to restore.

Parameters:

save_list (list of parameters) – A list of parameters (tensor) to be saved.
name (str) – The name of the .npz file.

Load network from dict (npz)¶

tensorlayerx.files.load_and_assign_npz_dict(name='model.npz', network=None, skip=False)[source]¶

Restore the parameters saved by tlx.files.save_npz_dict().

Parameters:

name (str) – The name of the .npz file.
network (Model) – The network to be assigned.
skip (boolean) – If ‘skip’ == True, loaded weights whose name is not found in network’s weights will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.

Save network into OrderedDict (hdf5)¶

tensorlayerx.files.save_weights_to_hdf5(save_list, filepath)[source]¶

Input filepath and save weights in hdf5 format.

Parameters:

filepath (str) – Filename to which the weights will be saved.
network (Model) – TL model.

Load network from hdf5 in order¶

tensorlayerx.files.load_hdf5_to_weights_in_order(filepath, network, skip=False)[source]¶

Load weights sequentially from a given file of hdf5 format

Parameters:

filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.
network (Model) – TL model.
Notes – If the file contains more weights than given ‘weights’, then the redundant ones will be ignored if all previous weights match perfectly.

Load network from hdf5 by name¶

tensorlayerx.files.load_hdf5_to_weights(filepath, network, skip=False)[source]¶

Load weights by name from a given file of hdf5 format

Parameters:

filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.
network (Model) – TL model.
skip (bool) – If ‘skip’ == True, loaded weights whose name is not found in ‘weights’ will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.

Load and save variables¶

Save variables as .npy¶

tensorlayerx.files.save_any_to_npy(save_dict=None, name='file.npy')[source]¶

Save variables to .npy file.

Parameters:

save_dict (directory) – The variables to be saved.
name (str) – File name.

Examples

>>> tlx.files.save_any_to_npy(save_dict={'data': ['a','b']}, name='test.npy')
>>> data = tlx.files.load_npy_to_any(name='test.npy')
>>> print(data)
{'data': ['a','b']}

Load variables from .npy¶

tensorlayerx.files.load_npy_to_any(path='', name='file.npy')[source]¶

Load .npy file.

Parameters:

path (str) – Path to the file (optional).
name (str) – File name.

Examples

see tlx.files.save_any_to_npy()

Folder/File functions¶

Check file exists¶

tensorlayerx.files.file_exists(filepath)[source]¶: Check whether a file exists by given file path.

Check folder exists¶

tensorlayerx.files.folder_exists(folderpath)[source]¶: Check whether a folder exists by given folder path.

Delete file¶

tensorlayerx.files.del_file(filepath)[source]¶: Delete a file by given file path.

Delete folder¶

tensorlayerx.files.del_folder(folderpath)[source]¶: Delete a folder by given folder path.

Read file¶

tensorlayerx.files.read_file(filepath)[source]¶

Read a file and return a string.

Examples

>>> data = tlx.files.read_file('data.txt')

Load file list from folder¶

tensorlayerx.files.load_file_list(path=None, regx='\\.jpg', printable=True, keep_prefix=False)[source]¶

Return a file list in a folder by given a path and regular expression.

Parameters:

path (str or None) – A folder path, if None, use the current directory.
regx (str) – The regx of file name.
printable (boolean) – Whether to print the files infomation.
keep_prefix (boolean) – Whether to keep path in the file name.

Examples

>>> file_list = tlx.files.load_file_list(path=None, regx='w1pre_[0-9]+\.(npz)')

Load folder list from folder¶

tensorlayerx.files.load_folder_list(path='')[source]¶

Return a folder list in a folder by given a folder path.

Parameters:: path (str) – A folder path.

Check and Create folder¶

tensorlayerx.files.exists_or_mkdir(path, verbose=True)[source]¶

Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True.

Parameters:

path (str) – A folder path.
verbose (boolean) – If True (default), prints results.

Returns:

True if folder already exist, otherwise, returns False and create the folder.

Return type:

boolean

Examples

>>> tlx.files.exists_or_mkdir("checkpoints/train")

Download or extract¶

tensorlayerx.files.maybe_download_and_extract(filename, working_directory, url_source, extract=False, expected_bytes=None)[source]¶

Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar”

Parameters:

filename (str) – The name of the (to be) dowloaded file.
working_directory (str) – A folder path to search for the file in and dowload the file to
url (str) – The URL to download the file from
extract (boolean) – If True, tries to uncompress the dowloaded file is “.tar.gz/.tar.bz2” or “.zip” file, default is False.
expected_bytes (int or None) – If set tries to verify that the downloaded file is of the specified size, otherwise raises an Exception, defaults is None which corresponds to no check being performed.

Returns:

File path of the dowloaded (uncompressed) file.

Return type:

str

Examples

>>> down_file = tlx.files.maybe_download_and_extract(filename='train-images-idx3-ubyte.gz',
...                                            working_directory='data/',
...                                            url_source='http://yann.lecun.com/exdb/mnist/')
>>> tlx.files.maybe_download_and_extract(filename='ADEChallengeData2016.zip',
...                                             working_directory='data/',
...                                             url_source='http://sceneparsing.csail.mit.edu/data/',
...                                             extract=True)

Sort¶

List of string with number in human order¶

tensorlayerx.files.natural_keys(text)[source]¶

Sort list of string with number in human order.

Examples

>>> l = ['im1.jpg', 'im31.jpg', 'im11.jpg', 'im21.jpg', 'im03.jpg', 'im05.jpg']
>>> l.sort(key=tlx.files.natural_keys)
['im1.jpg', 'im03.jpg', 'im05', 'im11.jpg', 'im21.jpg', 'im31.jpg']
>>> l.sort() # that is what we dont want
['im03.jpg', 'im05', 'im1.jpg', 'im11.jpg', 'im21.jpg', 'im31.jpg']

References

link

Visualizing npz file¶

tensorlayerx.files.npz_to_W_pdf(path=None, regx='w1pre_[0-9]+\\.(npz)')[source]¶

Convert the first weight matrix of .npz file to .pdf by using tlx.visualize.W().

Parameters:

path (str) – A folder path to npz files.
regx (str) – Regx for the file name.

Examples

Convert the first weight matrix of w1_pre…npz file to w1_pre…pdf.

>>> tlx.files.npz_to_W_pdf(path='/Users/.../npz_file/', regx='w1pre_[0-9]+\.(npz)')