API - Dataflow

Dataflow list

DataLoader(dataset[, batch_size, shuffle, …])

Data loader.

Dataset()

An abstract class to encapsulate methods and behaviors of datasets.

IterableDataset()

An abstract class to encapsulate methods and behaviors of iterable datasets.

TensorDataset(*tensors)

Generate a dataset from a list of tensors.

ChainDataset(datasets)

A Dataset which chains multiple iterable-tyle datasets.

ConcatDataset(datasets)

Concat multiple datasets into a new dataset

Subset(dataset, indices)

Subset of a dataset at specified indices.

random_split(dataset, lengths)

Randomly split a dataset into non-overlapping new datasets of given lengths.

Sampler()

Base class for all Samplers.

BatchSampler([sampler, batch_size, drop_last])

Wraps another sampler to yield a mini-batch of indices.

RandomSampler(data[, replacement, …])

Samples elements randomly.

SequentialSampler(data)

Samples elements sequentially, always in the same order.

WeightedRandomSampler(weights, num_samples)

Samples elements from [0,..,len(weights)-1] with given probabilities (weights).

SubsetRandomSampler(indices)

Samples elements randomly from a given list of indices, without replacement.

Dataflow

DataLoader

class tensorlayerx.dataflow.DataLoader(dataset, batch_size=1, shuffle=False, drop_last=False, sampler=None, batch_sampler=None, num_workers=0, collate_fn=None, time_out=0, worker_init_fn=None, prefetch_factor=2, persistent_workers=False)[source]

Data loader. Combines a dataset and a sampler, and provides an iterable over the given dataset.

The tensorlayerx.dataflow.DataLoader supports both map-style and iterable-style datasets with single- or multi-process loading, customizing loading order and optional automatic batching

Parameters
  • dataset (Dataset) – dataset from which to load the data.

  • batch_size (int) – how many samples per batch to load, default is 1.

  • shuffle (bool) – set to True to have the data reshuffled at every epoch, default is False.

  • drop_last (bool) – set to True to drop the last incomplete batch, if the dataset size is not divisible by the batch size. If False and the size of dataset is not divisible by the batch size, then the last batch will be smaller. default is False.

  • sampler (Sampler) – defines the strategy to draw samples from the dataset. If specified, shuffle must not be specified.

  • batch_sampler (Sampler) – returns a batch of indices at a time. If specified, shuffle, batch_size, drop_last, sampler must not be specified.

  • num_workers (int) – how many subprocesses to use for data loading. 0 means that the data will be loaded in single process. default is 0.

  • collate_fn (callable) – merges a list of samples to form a mini-batch of Tensor(s). Used when using batched loading from a map-style dataset.

  • time_out (numeric) – if positive, the timeout value for collecting a batch from workers. Should always be non-negative. default is 0.

  • worker_init_fn (callable) – If not None, this will be called on each worker subprocess with the worker id (an int in [0, num_workers - 1]) as input, after seeding and before data loading. default is None.

  • prefetch_factor (int) – Number of samples loaded in advance by each worker. 2 means there will be a total of 2 * num_workers samples prefetched across all workers. default is 2

  • persistent_workers (bool) – If True, the data loader will not shutdown the worker processes after a dataset has been consumed once. This allows to maintain the workers Dataset instances alive. default is False.

Dataset

class tensorlayerx.dataflow.Dataset[source]

An abstract class to encapsulate methods and behaviors of datasets. All datasets in map-style(dataset samples can be get by a given key) should be a subclass of ‘tensorlayerx.dataflow.Dataset’. ALl subclasses should implement following methods: __getitem__: get sample from dataset with a given index. __len__: return dataset sample number. __add__: concat two datasets

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import Dataset
>>> class mnistdataset(Dataset):
>>>     def __init__(self, data, label,transform):
>>>         self.data = data
>>>         self.label = label
>>>         self.transform = transform
>>>     def __getitem__(self, index):
>>>         data = self.data[index].astype('float32')
>>>         data = self.transform(data)
>>>         label = self.label[index].astype('int64')
>>>         return data, label
>>>     def __len__(self):
>>>         return len(self.data)
>>> train_dataset = mnistdataset(data = X_train, label = y_train ,transform = transform)

IterableDataset

class tensorlayerx.dataflow.IterableDataset[source]

An abstract class to encapsulate methods and behaviors of iterable datasets. All datasets in iterable-style (can only get sample one by one sequentially, likea Python iterator) should be a subclass of tensorlayerx.dataflow.IterableDataset. All subclasses should implement following methods: __iter__: yield sample sequentially.

Examples

With TensorLayerx

>>>#example 1: >>> from tensorlayerx.dataflow import IterableDataset >>> class mnistdataset(IterableDataset): >>> def __init__(self, data, label,transform): >>> self.data = data >>> self.label = label >>> self.transform = transform >>> def __iter__(self): >>> for i in range(len(self.data)): >>> data = self.data[i].astype(‘float32’) >>> data = self.transform(data) >>> label = self.label[i].astype(‘int64’) >>> yield data, label >>> train_dataset = mnistdataset(data = X_train, label = y_train ,transform = transform) >>>#example 2: >>>iterable_dataset_1 = mnistdataset(data_1, label_1, transform_1) >>>iterable_dataset_2 = mnistdataset(data_2, label_2, transform_2) >>>new_iterable_dataset = iterable_dataset_1 + iterable_dataset_2

TensorDataset

class tensorlayerx.dataflow.TensorDataset(*tensors)[source]

Generate a dataset from a list of tensors. Each sample will be retrieved by indexing tensors along the first dimension.

Parameters

*tensor (list or tuple of tensors) – tensors that have the same size of the first dimension.

Examples

With TensorLayerx

>>> import numpy as np
>>> import tensorlayerx as tlx
>>> data = np.random.random([10,224,224,3]).astype(np.float32)
>>> label = np.random.random((10,)).astype(np.int32)
>>> data = tlx.convert_to_tensor(data)
>>> label = tlx.convert_to_tensor(label)
>>> dataset = tlx.dataflow.TensorDataset([data, label])
>>> for i in range(len(dataset)):
>>>     x, y = dataset[i]

ChainDataset

class tensorlayerx.dataflow.ChainDataset(datasets)[source]

A Dataset which chains multiple iterable-tyle datasets.

Parameters

datasets (list or tuple) – sequence of datasets to be chainned.

Examples

With TensorLayerx

>>> import numpy as np
>>> from tensorlayerx.dataflow import IterableDataset, ChainDataset
>>> class mnistdataset(IterableDataset):
>>>     def __init__(self, data, label):
>>>         self.data = data
>>>         self.label = label
>>>     def __iter__(self):
>>>         for i in range(len(self.data)):
>>>             yield self.data[i] self.label[i]
>>> train_dataset1 = mnistdataset(data = X_train1, label = y_train1)
>>> train_dataset2 = mnistdataset(data = X_train2, label = y_train2)
>>> train_dataset = ChainDataset([train_dataset1, train_dataset2])

ConcatDataset

class tensorlayerx.dataflow.ConcatDataset(datasets)[source]

Concat multiple datasets into a new dataset

Parameters

datasets (list or tuple) – sequence of datasets to be concatenated

Examples

With TensorLayerx

>>> import numpy as np
>>> from tensorlayerx.dataflow import Dataset, ConcatDataset
>>> class mnistdataset(Dataset):
>>>     def __init__(self, data, label,transform):
>>>         self.data = data
>>>         self.label = label
>>>         self.transform = transform
>>>     def __getitem__(self, index):
>>>         data = self.data[index].astype('float32')
>>>         data = self.transform(data)
>>>         label = self.label[index].astype('int64')
>>>         return data, label
>>>     def __len__(self):
>>>         return len(self.data)
>>> train_dataset1 = mnistdataset(data = X_train1, label = y_train1 ,transform = transform1)
>>> train_dataset2 = mnistdataset(data = X_train2, label = y_train2 ,transform = transform2)
>>> train_dataset = ConcatDataset([train_dataset1, train_dataset2])

Subset

class tensorlayerx.dataflow.Subset(dataset, indices)[source]

Subset of a dataset at specified indices.

Parameters
  • dataset (Dataset) – The whole Dataset

  • indices (list or tuple) – Indices in the whole set selected for subset

Examples

With TensorLayerx

>>> import numpy as np
>>> from tensorlayerx.dataflow import Dataset, Subset
>>> class mnistdataset(Dataset):
>>>     def __init__(self, data, label):
>>>         self.data = data
>>>         self.label = label
>>>     def __iter__(self):
>>>         for i in range(len(self.data)):
>>>             yield self.data[i] self.label[i]
>>> train_dataset = mnistdataset(data = X_train, label = y_train)
>>> sub_dataset = Subset(train_dataset, indices=[1,2,3])

random_split

class tensorlayerx.dataflow.random_split[source]

Randomly split a dataset into non-overlapping new datasets of given lengths.

Parameters
  • dataset (Dataset) – dataset to be split

  • lengths (list or tuple) – lengths of splits to be produced

Examples

With TensorLayerx

>>> import numpy as np
>>> from tensorlayerx.dataflow import Dataset, Subset
>>> random_split(range(10), [3, 7])

Sampler

class tensorlayerx.dataflow.Sampler[source]

Base class for all Samplers. All subclasses should implement following methods: __iter__: providing a way to iterate over indices of dataset element __len__: the length of the returned iterators.

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import Sampler
>>> class MySampler(Sampler):
>>>     def __init__(self, data):
>>>         self.data = data
>>>     def __iter__(self):
>>>         return iter(range(len(self.data_source)))
>>>     def __len__(self):
>>>         return len(self.data)

BatchSampler

class tensorlayerx.dataflow.BatchSampler(sampler=None, batch_size=1, drop_last=False)[source]

Wraps another sampler to yield a mini-batch of indices.

Parameters
  • sampler (Sampler) – Base sampler.

  • batch_size (int) – Size of mini-batch

  • drop_last (bool) – If True, the sampler will drop the last batch if its size would be less than batch_size

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import BatchSampler, SequentialSampler
>>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=False))
>>> #[[0, 1, 2], [3, 4, 5], [6, 7, 8], [9]]
>>> list(BatchSampler(SequentialSampler(range(10)), batch_size=3, drop_last=True))
>>> #[[0, 1, 2], [3, 4, 5], [6, 7, 8]]

RandomSampler

class tensorlayerx.dataflow.RandomSampler(data, replacement=False, num_samples=None, generator=None)[source]

Samples elements randomly. If without replacement, then sample from a shuffled dataset. If with replacement, then user can specify`num_samples` to draw.

Parameters
  • data (Dataset) – dataset to sample

  • replacement (bool) – samples are drawn on-demand with replacement if True, default=``False``

  • num_samples (int) – number of samples to draw, default=`len(dataset)`. This argument is supposed to be specified only when replacement is True.

  • generator (Generator) – Generator used in sampling. Default is None.

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import RandomSampler, Dataset
>>> import numpy as np
>>> class mydataset(Dataset):
>>>     def __init__(self):
>>>         self.data = [np.random.random((224,224,3)) for i in range(100)]
>>>         self.label = [np.random.randint(1, 10, (1,)) for i in range(100)]
>>>     def __getitem__(self, item):
>>>         x = self.data[item]
>>>         y = self.label[item]
>>>         return x, y
>>>     def __len__(self):
>>>         return len(self.data)
>>> sampler = RandomSampler(data = mydataset())

SequentialSampler

class tensorlayerx.dataflow.SequentialSampler(data)[source]

Samples elements sequentially, always in the same order.

Parameters

data (Dataset) – dataset to sample

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import SequentialSampler, Dataset
>>> import numpy as np
>>> class mydataset(Dataset):
>>>     def __init__(self):
>>>         self.data = [np.random.random((224,224,3)) for i in range(100)]
>>>         self.label = [np.random.randint(1, 10, (1,)) for i in range(100)]
>>>     def __getitem__(self, item):
>>>         x = self.data[item]
>>>         y = self.label[item]
>>>         return x, y
>>>     def __len__(self):
>>>         return len(self.data)
>>> sampler = SequentialSampler(data = mydataset())

WeightedRandomSampler

class tensorlayerx.dataflow.WeightedRandomSampler(weights, num_samples, replacement=True)[source]

Samples elements from [0,..,len(weights)-1] with given probabilities (weights).

Parameters
  • weights (list or tuple) – a sequence of weights, not necessary summing up to one

  • num_samples (int) – number of samples to draw

  • replacement (bool) – if True, samples are drawn with replacement. If not, they are drawn without replacement, which means that when a sample index is drawn for a row, it cannot be drawn again for that row.

Examples

With TensorLayerx

>>> from tensorlayerx.dataflow import WeightedRandomSampler, Dataset
>>> import numpy as np
>>> sampler = list(WeightedRandomSampler(weights=[0.2,0.3,0.4,0.5,4.0], num_samples=5, replacement=True))
>>> #[4, 4, 1, 4, 4]
>>> sampler = list(WeightedRandomSampler(weights=[0.2,0.3,0.4,0.5,0.6], num_samples=5, replacement=False))
>>> #[4, 1, 3, 0, 2]

SubsetRandomSampler

class tensorlayerx.dataflow.SubsetRandomSampler(indices)[source]

Samples elements randomly from a given list of indices, without replacement.

Parameters

indices (list or tuple) – sequence of indices