API - Files¶
A collections of helper functions to work with dataset. Load benchmark dataset, save and restore model, save and load variables.
TensorLayer provides rich layer implementations trailed for
various benchmarks and domain-specific problems. In addition, we also
support transparent access to native TensorFlow parameters.
For example, we provide not only layers for local response normalization, but also
layers that allow user to apply tf.ops.lrn
on network.outputs
.
More functions can be found in TensorFlow API.
|
Load the original mnist. |
|
Load the fashion mnist. |
|
Load CIFAR-10 dataset. |
|
Load Cropped SVHN. |
|
Load Matt Mahoney’s dataset. |
|
Load IMDB dataset. |
|
Load Nietzsche dataset. |
|
Load Flickr25K dataset. |
|
Load Flick1M dataset. |
|
Load images from CycleGAN’s database, see this link. |
|
Load CelebA dataset |
|
Load MPII Human Pose Dataset. |
|
Download file from Google Drive. |
|
Input parameters and the file name, save parameters into .npz file. |
|
Load the parameters of a Model saved by tlx.files.save_npz(). |
|
Assign the given parameters to the TensorLayer network. |
|
Load model from npz and assign to a network. |
|
Input parameters and the file name, save parameters as a dictionary into .npz file. |
|
Restore the parameters saved by |
|
Input filepath and save weights in hdf5 format. |
|
Load weights sequentially from a given file of hdf5 format |
|
Load weights by name from a given file of hdf5 format |
|
Save variables to .npy file. |
|
Load .npy file. |
|
Check whether a file exists by given file path. |
|
Check whether a folder exists by given folder path. |
|
Delete a file by given file path. |
|
Delete a folder by given folder path. |
|
Read a file and return a string. |
|
Return a file list in a folder by given a path and regular expression. |
|
Return a folder list in a folder by given a folder path. |
|
Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True. |
|
Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar” |
|
Sort list of string with number in human order. |
Load dataset functions¶
MNIST¶
-
tensorlayerx.files.
load_mnist_dataset
(shape=(-1, 784), path='data')[source]¶ Load the original mnist.
Automatically download MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 digit images respectively.
- Parameters
shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).
path (str) – The path that the data is downloaded to.
- Returns
X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.
- Return type
tuple
Examples
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1,784), path='datasets') >>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_mnist_dataset(shape=(-1, 28, 28, 1))
Fashion-MNIST¶
-
tensorlayerx.files.
load_fashion_mnist_dataset
(shape=(-1, 784), path='data')[source]¶ Load the fashion mnist.
Automatically download fashion-MNIST dataset and return the training, validation and test set with 50000, 10000 and 10000 fashion images respectively, examples.
- Parameters
shape (tuple) – The shape of digit images (the default is (-1, 784), alternatively (-1, 28, 28, 1)).
path (str) – The path that the data is downloaded to.
- Returns
X_train, y_train, X_val, y_val, X_test, y_test – Return splitted training/validation/test set respectively.
- Return type
tuple
Examples
>>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1,784), path='datasets') >>> X_train, y_train, X_val, y_val, X_test, y_test = tlx.files.load_fashion_mnist_dataset(shape=(-1, 28, 28, 1))
CIFAR-10¶
-
tensorlayerx.files.
load_cifar10_dataset
(shape=(-1, 32, 32, 3), path='data', plotable=False)[source]¶ Load CIFAR-10 dataset.
It consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.
The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.
- Parameters
shape (tupe) – The shape of digit images e.g. (-1, 3, 32, 32) and (-1, 32, 32, 3).
path (str) – The path that the data is downloaded to, defaults is
data/cifar10/
.plotable (boolean) – Whether to plot some image examples, False as default.
Examples
>>> X_train, y_train, X_test, y_test = tlx.files.load_cifar10_dataset(shape=(-1, 32, 32, 3))
References
SVHN¶
-
tensorlayerx.files.
load_cropped_svhn
(path='data', include_extra=True)[source]¶ Load Cropped SVHN.
The Cropped Street View House Numbers (SVHN) Dataset contains 32x32x3 RGB images. Digit ‘1’ has label 1, ‘9’ has label 9 and ‘0’ has label 0 (the original dataset uses 10 to represent ‘0’), see ufldl website.
- Parameters
path (str) – The path that the data is downloaded to.
include_extra (boolean) – If True (default), add extra images to the training set.
- Returns
X_train, y_train, X_test, y_test – Return splitted training/test set respectively.
- Return type
tuple
Examples
>>> X_train, y_train, X_test, y_test = tlx.files.load_cropped_svhn(include_extra=False) >>> tlx.vis.save_images(X_train[0:100], [10, 10], 'svhn.png')
Matt Mahoney’s text8¶
-
tensorlayerx.files.
load_matt_mahoney_text8_dataset
(path='data')[source]¶ Load Matt Mahoney’s dataset.
Download a text file from Matt Mahoney’s website if not present, and make sure it’s the right size. Extract the first file enclosed in a zip file as a list of words. This dataset can be used for Word Embedding.
- Parameters
path (str) – The path that the data is downloaded to, defaults is
data/mm_test8/
.- Returns
The raw text data e.g. […. ‘their’, ‘families’, ‘who’, ‘were’, ‘expelled’, ‘from’, ‘jerusalem’, …]
- Return type
list of str
Examples
>>> words = tlx.files.load_matt_mahoney_text8_dataset() >>> print('Data size', len(words))
IMBD¶
-
tensorlayerx.files.
load_imdb_dataset
(path='data', nb_words=None, skip_top=0, maxlen=None, test_split=0.2, seed=113, start_char=1, oov_char=2, index_from=3)[source]¶ Load IMDB dataset.
- Parameters
path (str) – The path that the data is downloaded to, defaults is
data/imdb/
.nb_words (int) – Number of words to get.
skip_top (int) – Top most frequent words to ignore (they will appear as oov_char value in the sequence data).
maxlen (int) – Maximum sequence length. Any longer sequence will be truncated.
seed (int) – Seed for reproducible data shuffling.
start_char (int) – The start of a sequence will be marked with this character. Set to 1 because 0 is usually the padding character.
oov_char (int) – Words that were cut out because of the num_words or skip_top limit will be replaced with this character.
index_from (int) – Index actual words with this index and higher.
Examples
>>> X_train, y_train, X_test, y_test = tlx.files.load_imdb_dataset( ... nb_words=20000, test_split=0.2) >>> print('X_train.shape', X_train.shape) (20000,) [[1, 62, 74, ... 1033, 507, 27],[1, 60, 33, ... 13, 1053, 7]..] >>> print('y_train.shape', y_train.shape) (20000,) [1 0 0 ..., 1 0 1]
References
Nietzsche¶
-
tensorlayerx.files.
load_nietzsche_dataset
(path='data')[source]¶ Load Nietzsche dataset.
- Parameters
path (str) – The path that the data is downloaded to, defaults is
data/nietzsche/
.- Returns
The content.
- Return type
str
Examples
>>> see tutorial_generate_text.py >>> words = tlx.files.load_nietzsche_dataset() >>> words = basic_clean_str(words) >>> words = words.split()
Flickr25k¶
-
tensorlayerx.files.
load_flickr25k_dataset
(tag='sky', path='data', n_threads=50, printable=False)[source]¶ Load Flickr25K dataset.
Returns a list of images by a given tag from Flick25k dataset, it will download Flickr25k from the official website at the first time you use it.
- Parameters
tag (str or None) –
- What images to return.
If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.
If you want to get all images, set to
None
.
path (str) – The path that the data is downloaded to, defaults is
data/flickr25k/
.n_threads (int) – The number of thread to read image.
printable (boolean) – Whether to print infomation when reading images, default is
False
.
Examples
Get images with tag of sky
>>> images = tlx.files.load_flickr25k_dataset(tag='sky')
Get all images
>>> images = tlx.files.load_flickr25k_dataset(tag=None, n_threads=100, printable=True)
Flickr1M¶
-
tensorlayerx.files.
load_flickr1M_dataset
(tag='sky', size=10, path='data', n_threads=50, printable=False)[source]¶ Load Flick1M dataset.
Returns a list of images by a given tag from Flickr1M dataset, it will download Flickr1M from the official website at the first time you use it.
- Parameters
tag (str or None) –
- What images to return.
If you want to get images with tag, use string like ‘dog’, ‘red’, see Flickr Search.
If you want to get all images, set to
None
.
size (int) – integer between 1 to 10. 1 means 100k images … 5 means 500k images, 10 means all 1 million images. Default is 10.
path (str) – The path that the data is downloaded to, defaults is
data/flickr25k/
.n_threads (int) – The number of thread to read image.
printable (boolean) – Whether to print infomation when reading images, default is
False
.
Examples
Use 200k images
>>> images = tlx.files.load_flickr1M_dataset(tag='zebra', size=2)
Use 1 Million images
>>> images = tlx.files.load_flickr1M_dataset(tag='zebra')
CycleGAN¶
-
tensorlayerx.files.
load_cyclegan_dataset
(filename='summer2winter_yosemite', path='data')[source]¶ Load images from CycleGAN’s database, see this link.
- Parameters
filename (str) – The dataset you want, see this link.
path (str) – The path that the data is downloaded to, defaults is data/cyclegan
Examples
>>> im_train_A, im_train_B, im_test_A, im_test_B = load_cyclegan_dataset(filename='summer2winter_yosemite')
CelebA¶
MPII¶
-
tensorlayerx.files.
load_mpii_pose_dataset
(path='data', is_16_pos_only=False)[source]¶ Load MPII Human Pose Dataset.
- Parameters
path (str) – The path that the data is downloaded to.
is_16_pos_only (boolean) – If True, only return the peoples contain 16 pose keypoints. (Usually be used for single person pose estimation)
- Returns
img_train_list (list of str) – The image directories of training data.
ann_train_list (list of dict) – The annotations of training data.
img_test_list (list of str) – The image directories of testing data.
ann_test_list (list of dict) – The annotations of testing data.
Examples
>>> import pprint >>> import tensorlayerx as tlx >>> img_train_list, ann_train_list, img_test_list, ann_test_list = tlx.files.load_mpii_pose_dataset() >>> image = tlx.vis.read_image(img_train_list[0]) >>> tlx.vis.draw_mpii_pose_to_image(image, ann_train_list[0], 'image.png') >>> pprint.pprint(ann_train_list[0])
References
Google Drive¶
Load and save network¶
TensorFlow provides .ckpt
file format to save and restore the models, while
we suggest to use standard python file format hdf5
to save models for the
sake of cross-platform. Other file formats such as .npz
are also available.
## save model as .h5
tlx.files.save_weights_to_hdf5('model.h5', network.all_weights)
# restore model from .h5 (in order)
tlx.files.load_hdf5_to_weights_in_order('model.h5', network.all_weights)
# restore model from .h5 (by name)
tlx.files.load_hdf5_to_weights('model.h5', network.all_weights)
## save model as .npz
tlx.files.save_npz(network.all_weights , name='model.npz')
# restore model from .npz (method 1)
load_params = tl.files.load_npz(name='model.npz')
tlx.files.assign_weights(sess, load_params, network)
# restore model from .npz (method 2)
tlx.files.load_and_assign_npz(sess=sess, name='model.npz', network=network)
## you can assign the pre-trained parameters as follow
# 1st parameter
tlx.files.assign_weights(sess, [load_params[0]], network)
# the first three parameters
tlx.files.assign_weights(sess, load_params[:3], network)
Save network into list (npz)¶
-
tensorlayerx.files.
save_npz
(save_list=None, name='model.npz')[source]¶ Input parameters and the file name, save parameters into .npz file. Use tlx.utils.load_npz() to restore.
- Parameters
save_list (list of tensor) – A list of parameters (tensor) to be saved.
name (str) – The name of the .npz file.
Examples
Save model to npz
>>> tlx.files.save_npz(network.all_weights, name='model.npz')
Load model from npz (Method 1)
>>> load_params = tlx.files.load_npz(name='model.npz') >>> tlx.files.assign_weights(load_params, network)
Load model from npz (Method 2)
>>> tlx.files.load_and_assign_npz(name='model.npz', network=network)
References
Load network from list (npz)¶
-
tensorlayerx.files.
load_npz
(path='', name='model.npz')[source]¶ Load the parameters of a Model saved by tlx.files.save_npz().
- Parameters
path (str) – Folder path to .npz file.
name (str) – The name of the .npz file.
- Returns
A list of parameters in order.
- Return type
list of array
Examples
See
tlx.files.save_npz
References
Assign a list of parameters to network¶
-
tensorlayerx.files.
assign_weights
(weights, network)[source]¶ Assign the given parameters to the TensorLayer network.
- Parameters
weights (list of array) – A list of model weights (array) in order.
network (
Layer
) – The network to be assigned.
- Returns
1) list of operations if in graph mode – A list of tf ops in order that assign weights. Support sess.run(ops) manually.
2) list of tf variables if in eager mode – A list of tf variables (assigned weights) in order.
Examples
References
Load and assign a list of parameters to network¶
Save network into dict (npz)¶
-
tensorlayerx.files.
save_npz_dict
(save_list=None, name='model.npz')[source]¶ Input parameters and the file name, save parameters as a dictionary into .npz file.
Use
tlx.files.load_and_assign_npz_dict()
to restore.- Parameters
save_list (list of parameters) – A list of parameters (tensor) to be saved.
name (str) – The name of the .npz file.
Load network from dict (npz)¶
-
tensorlayerx.files.
load_and_assign_npz_dict
(name='model.npz', network=None, skip=False)[source]¶ Restore the parameters saved by
tlx.files.save_npz_dict()
.- Parameters
name (str) – The name of the .npz file.
network (
Model
) – The network to be assigned.skip (boolean) – If ‘skip’ == True, loaded weights whose name is not found in network’s weights will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.
Save network into OrderedDict (hdf5)¶
Load network from hdf5 in order¶
-
tensorlayerx.files.
load_hdf5_to_weights_in_order
(filepath, network, skip=False)[source]¶ Load weights sequentially from a given file of hdf5 format
- Parameters
filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.
network (Model) – TL model.
Notes – If the file contains more weights than given ‘weights’, then the redundant ones will be ignored if all previous weights match perfectly.
Load network from hdf5 by name¶
-
tensorlayerx.files.
load_hdf5_to_weights
(filepath, network, skip=False)[source]¶ Load weights by name from a given file of hdf5 format
- Parameters
filepath (str) – Filename to which the weights will be loaded, should be of hdf5 format.
network (Model) – TL model.
skip (bool) – If ‘skip’ == True, loaded weights whose name is not found in ‘weights’ will be skipped. If ‘skip’ is False, error will be raised when mismatch is found. Default False.
Load and save variables¶
Save variables as .npy¶
-
tensorlayerx.files.
save_any_to_npy
(save_dict=None, name='file.npy')[source]¶ Save variables to .npy file.
- Parameters
save_dict (directory) – The variables to be saved.
name (str) – File name.
Examples
>>> tlx.files.save_any_to_npy(save_dict={'data': ['a','b']}, name='test.npy') >>> data = tlx.files.load_npy_to_any(name='test.npy') >>> print(data) {'data': ['a','b']}
Load variables from .npy¶
Folder/File functions¶
Check file exists¶
Check folder exists¶
Delete file¶
Delete folder¶
Read file¶
Load file list from folder¶
-
tensorlayerx.files.
load_file_list
(path=None, regx='\\.jpg', printable=True, keep_prefix=False)[source]¶ Return a file list in a folder by given a path and regular expression.
- Parameters
path (str or None) – A folder path, if None, use the current directory.
regx (str) – The regx of file name.
printable (boolean) – Whether to print the files infomation.
keep_prefix (boolean) – Whether to keep path in the file name.
Examples
>>> file_list = tlx.files.load_file_list(path=None, regx='w1pre_[0-9]+\.(npz)')
Load folder list from folder¶
Check and Create folder¶
-
tensorlayerx.files.
exists_or_mkdir
(path, verbose=True)[source]¶ Check a folder by given name, if not exist, create the folder and return False, if directory exists, return True.
- Parameters
path (str) – A folder path.
verbose (boolean) – If True (default), prints results.
- Returns
True if folder already exist, otherwise, returns False and create the folder.
- Return type
boolean
Examples
>>> tlx.files.exists_or_mkdir("checkpoints/train")
Download or extract¶
-
tensorlayerx.files.
maybe_download_and_extract
(filename, working_directory, url_source, extract=False, expected_bytes=None)[source]¶ Checks if file exists in working_directory otherwise tries to dowload the file, and optionally also tries to extract the file if format is “.zip” or “.tar”
- Parameters
filename (str) – The name of the (to be) dowloaded file.
working_directory (str) – A folder path to search for the file in and dowload the file to
url (str) – The URL to download the file from
extract (boolean) – If True, tries to uncompress the dowloaded file is “.tar.gz/.tar.bz2” or “.zip” file, default is False.
expected_bytes (int or None) – If set tries to verify that the downloaded file is of the specified size, otherwise raises an Exception, defaults is None which corresponds to no check being performed.
- Returns
File path of the dowloaded (uncompressed) file.
- Return type
str
Examples
>>> down_file = tlx.files.maybe_download_and_extract(filename='train-images-idx3-ubyte.gz', ... working_directory='data/', ... url_source='http://yann.lecun.com/exdb/mnist/') >>> tlx.files.maybe_download_and_extract(filename='ADEChallengeData2016.zip', ... working_directory='data/', ... url_source='http://sceneparsing.csail.mit.edu/data/', ... extract=True)
Sort¶
List of string with number in human order¶
-
tensorlayerx.files.
natural_keys
(text)[source]¶ Sort list of string with number in human order.
Examples
>>> l = ['im1.jpg', 'im31.jpg', 'im11.jpg', 'im21.jpg', 'im03.jpg', 'im05.jpg'] >>> l.sort(key=tlx.files.natural_keys) ['im1.jpg', 'im03.jpg', 'im05', 'im11.jpg', 'im21.jpg', 'im31.jpg'] >>> l.sort() # that is what we dont want ['im03.jpg', 'im05', 'im1.jpg', 'im11.jpg', 'im21.jpg', 'im31.jpg']
References
Visualizing npz file¶
-
tensorlayerx.files.
npz_to_W_pdf
(path=None, regx='w1pre_[0-9]+\\.(npz)')[source]¶ Convert the first weight matrix of .npz file to .pdf by using tlx.visualize.W().
- Parameters
path (str) – A folder path to npz files.
regx (str) – Regx for the file name.
Examples
Convert the first weight matrix of w1_pre…npz file to w1_pre…pdf.
>>> tlx.files.npz_to_W_pdf(path='/Users/.../npz_file/', regx='w1pre_[0-9]+\.(npz)')