API Reference

class openTSNE.TSNE(n_components=2, perplexity=30, learning_rate='auto', early_exaggeration_iter=250, early_exaggeration=12, n_iter=500, exaggeration=None, dof=1, theta=0.5, n_interpolation_points=3, min_num_intervals=50, ints_in_interval=1, initialization='pca', metric='euclidean', metric_params=None, initial_momentum=0.5, final_momentum=0.8, max_grad_norm=None, n_jobs=1, neighbors='auto', negative_gradient_method='fft', callbacks=None, callbacks_every_iters=50, random_state=None, verbose=False)[source]

t-Distributed Stochastic Neighbor Embedding.

Please see the Parameter guide for more information.

Parameters:
  • n_components (int) – The dimension of the embedding space. This deafults to 2 for easy visualization, but sometimes 1 is used for t-SNE heatmaps. t-SNE is not designed to embed into higher dimension and please note that acceleration schemes break down and are not fully implemented.
  • perplexity (float) – Perplexity can be thought of as the continuous \(k\) number of nearest neighbors, for which t-SNE will attempt to preserve distances.
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.
  • early_exaggeration_iter (int) – The number of iterations to run in the early exaggeration phase.
  • early_exaggeration (float) – The exaggeration factor to use during the early exaggeration phase. Typical values range from 12 to 32.
  • n_iter (int) – The number of iterations to run in the normal optimization regime.
  • exaggeration (float) – The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.
  • dof (float) – Degrees of freedom as described in Kobak et al. “Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations”, 2019.
  • theta (float) – Only used when negative_gradient_method="bh" or its other aliases. This is the trade-off parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
  • n_interpolation_points (int) – Only used when negative_gradient_method="fft" or its other aliases. The number of interpolation points to use within each grid cell for interpolation based t-SNE. It is highly recommended leaving this value at the default 3.
  • min_num_intervals (int) – Only used when negative_gradient_method="fft" or its other aliases. The minimum number of grid cells to use, regardless of the ints_in_interval parameter. Higher values provide more accurate gradient estimations.
  • ints_in_interval (float) – Only used when negative_gradient_method="fft" or its other aliases. Indicates how large a grid cell should be e.g. a value of 3 indicates a grid side length of 3. Lower values provide more accurate gradient estimations.
  • initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can be a precomputed numpy array, pca, spectral or random. Please note that when passing in a precomputed positions, it is highly recommended that the point positions have small variance (std(Y) < 0.0001), otherwise you may get poor embeddings.
  • metric (Union[str, Callable]) – The metric to be used to compute affinities between points in the original space.
  • metric_params (dict) – Additional keyword arguments for the metric function.
  • initial_momentum (float) – The momentum to use during the early exaggeration phase.
  • final_momentum (float) – The momentum to use during the normal optimization phase.
  • max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
  • n_jobs (int) – The number of threads to use while running t-SNE. This follows the scikit-learn convention, -1 meaning all processors, -2 meaning all but one, etc.
  • neighbors (str) – Specifies the nearest neighbor method to use. Can be exact, annoy, pynndescent, approx, or auto (default). approx uses Annoy if the input data matrix is not a sparse object and if Annoy supports the given metric. Otherwise it uses Pynndescent. auto uses exact nearest neighbors for N<1000 and the same heuristic as approx for N>=1000.
  • negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller data sets, the Barnes-Hut approximation is appropriate and can be set using one of the following aliases: bh, BH or barnes-hut. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases: fft, FFT or ìnterpolation.
  • callbacks (Union[Callable, List[Callable]]) – Callbacks, which will be run every callbacks_every_iters iterations.
  • callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
  • random_state (Union[int, RandomState]) – If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
  • verbose (bool) –
fit(X)[source]

Fit a t-SNE embedding for a given data set.

Runs the standard t-SNE optimization, consisting of the early exaggeration phase and a normal optimization phase.

Parameters:X (np.ndarray) – The data matrix to be embedded.
Returns:A fully optimized t-SNE embedding.
Return type:TSNEEmbedding
prepare_initial(X)[source]

Prepare the initial embedding which can be optimized as needed.

Parameters:X (np.ndarray) – The data matrix to be embedded.
Returns:An unoptimized TSNEEmbedding object, prepared for optimization.
Return type:TSNEEmbedding
class openTSNE.TSNEEmbedding[source]

A t-SNE embedding.

Please see the Parameter guide for more information.

Parameters:
  • embedding (np.ndarray) – Initial positions for each data point.
  • affinities (Affinities) – An affinity index which can be used to compute the affinities of new points to the points in the existing embedding. The affinity index also contains the affinity matrix \(P\) used during optimization.
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for T-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.
  • exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
  • dof (float) – Degrees of freedom as described in Kobak et al. “Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations”, 2019.
  • momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
  • negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller data sets, the Barnes-Hut approximation is appropriate and can be set using one of the following aliases: bh, BH or barnes-hut. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases: fft, FFT or ìnterpolation.
  • theta (float) – This is the trade-off parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
  • n_interpolation_points (int) – Only used when negative_gradient_method="fft" or its other aliases. The number of interpolation points to use within each grid cell for interpolation based t-SNE. It is highly recommended leaving this value at the default 3.
  • min_num_intervals (int) – Only used when negative_gradient_method="fft" or its other aliases. The minimum number of grid cells to use, regardless of the ints_in_interval parameter. Higher values provide more accurate gradient estimations.
  • random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikit-learn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
  • n_jobs (int) – The number of threads to use while running t-SNE. This follows the scikit-learn convention, -1 meaning all processors, -2 meaning all but one, etc.
  • callbacks (Callable[[int, float, np.ndarray] -> bool]) – Callbacks, which will be run every callbacks_every_iters iterations.
  • callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
  • optimizer (gradient_descent) – Optionally, an existing optimizer can be used for optimization. This is useful for keeping momentum gains between different calls to optimize().
kl_divergence

The KL divergence or error of the embedding.

Type:float
optimize(n_iter, inplace=False, propagate_exception=False, **gradient_descent_params)[source]

Run optmization on the embedding for a given number of steps.

Please see the Parameter guide for more information.

Parameters:
  • n_iter (int) – The number of optimization iterations.
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.
  • exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
  • dof (float) – Degrees of freedom as described in Kobak et al. “Heavy-tailed kernels reveal a finer cluster structure in t-SNE visualisations”, 2019.
  • momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
  • negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller data sets, the Barnes-Hut approximation is appropriate and can be set using one of the following aliases: bh, BH or barnes-hut. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases: fft, FFT or ìnterpolation.
  • theta (float) – This is the trade-off parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
  • n_interpolation_points (int) – Only used when negative_gradient_method="fft" or its other aliases. The number of interpolation points to use within each grid cell for interpolation based t-SNE. It is highly recommended leaving this value at the default 3.
  • min_num_intervals (int) – Only used when negative_gradient_method="fft" or its other aliases. The minimum number of grid cells to use, regardless of the ints_in_interval parameter. Higher values provide more accurate gradient estimations.
  • inplace (bool) – Whether or not to create a copy of the embedding or to perform updates inplace.
  • propagate_exception (bool) – The optimization process can be interrupted using callbacks. This flag indicates whether we should propagate that exception or to simply stop optimization and return the resulting embedding.
  • max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
  • random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikit-learn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
  • n_jobs (int) – The number of threads to use while running t-SNE. This follows the scikit-learn convention, -1 meaning all processors, -2 meaning all but one, etc.
  • callbacks (Callable[[int, float, np.ndarray] -> bool]) – Callbacks, which will be run every callbacks_every_iters iterations.
  • callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
Returns:

An optimized t-SNE embedding.

Return type:

TSNEEmbedding

Raises:

OptimizationInterrupt – If a callback stops the optimization and the propagate_exception flag is set, then an exception is raised.

prepare_interpolation_grid(padding=0.25)[source]

Evaluate and save the interpolation grid coefficients.

Parameters:padding (float) – During standard optimization, the grid hugs the embedding points as closely as possible, but this is not what we want when performing transform. This paraemter specifies how much empty space should be appended in each dimension. The values are given in percentages.
prepare_partial(X, initialization='median', k=25, **affinity_params)[source]

Prepare a partial embedding which can be optimized.

Parameters:
  • X (np.ndarray) – The data matrix to be added to the existing embedding.
  • initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can be a precomputed numpy array, median, weighted or random. In all cases, median of weighted should be preferred.
  • k (int) – The number of nearest neighbors to consider when initially placing the point onto the embedding. This is different from perpelxity because perplexity affects optimization while this only affects the initial point positions.
  • **affinity_params (dict) – Additional params to be passed to the Affinities.to_new method. Please see individual Affinities implementations as the parameters differ between implementations.
Returns:

An unoptimized PartialTSNEEmbedding object, prepared for optimization.

Return type:

PartialTSNEEmbedding

transform(X, perplexity=5, initialization='median', k=25, learning_rate=0.1, early_exaggeration=4, early_exaggeration_iter=0, exaggeration=1.5, n_iter=250, initial_momentum=0.5, final_momentum=0.8, max_grad_norm=0.25)[source]

Embed new points into the existing embedding.

This procedure optimizes each point only with respect to the existing embedding i.e. it ignores any interactions between the points in X among themselves.

Please see the Parameter guide for more information.

Parameters:
  • X (np.ndarray) – The data matrix to be added to the existing embedding.
  • perplexity (float) – Perplexity can be thought of as the continuous \(k\) number of nearest neighbors, for which t-SNE will attempt to preserve distances. However, when transforming, we only consider neighbors in the existing embedding i.e. each data point is placed into the embedding, independently of other new data points.
  • initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can be a precomputed numpy array, median, weighted or random. In all cases, median of weighted should be preferred.
  • k (int) – The number of nearest neighbors to consider when initially placing the point onto the embedding. This is different from perpelxity because perplexity affects optimization while this only affects the initial point positions.
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.
  • early_exaggeration_iter (int) – The number of iterations to run in the early exaggeration phase.
  • early_exaggeration (float) – The exaggeration factor to use during the early exaggeration phase. Typical values range from 12 to 32.
  • n_iter (int) – The number of iterations to run in the normal optimization regime.
  • exaggeration (float) – The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.
  • initial_momentum (float) – The momentum to use during the early exaggeration phase.
  • final_momentum (float) – The momentum to use during the normal optimization phase.
  • max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
Returns:

The positions of the new points in the embedding space.

Return type:

PartialTSNEEmbedding

class openTSNE.PartialTSNEEmbedding[source]

A partial t-SNE embedding.

A partial embedding is created when we take an existing TSNEEmbedding and embed new samples into the embedding space. It differs from the typical embedding in that it is not possible to add new samples to a partial embedding and would generally be a bad idea.

Please see the Parameter guide for more information.

Parameters:
  • embedding (np.ndarray) – Initial positions for each data point.
  • reference_embedding (TSNEEmbedding) – The embedding into which the new samples are to be added.
  • P (array_like) – An \(N \times M\) affinity matrix containing the affinities from each new data point \(n\) to each data point in the existing embedding \(m\).
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12) as determined in Belkina et al. (2019), Nature Communications. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.
  • exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
  • momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
  • negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller data sets, the Barnes-Hut approximation is appropriate and can be set using one of the following aliases: bh, BH or barnes-hut. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases: fft, FFT or ìnterpolation.
  • theta (float) – This is the trade-off parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
  • n_interpolation_points (int) – Only used when negative_gradient_method="fft" or its other aliases. The number of interpolation points to use within each grid cell for interpolation based t-SNE. It is highly recommended leaving this value at the default 3.
  • min_num_intervals (int) – Only used when negative_gradient_method="fft" or its other aliases. The minimum number of grid cells to use, regardless of the ints_in_interval parameter. Higher values provide more accurate gradient estimations.
  • random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikit-learn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
  • n_jobs (int) – The number of threads to use while running t-SNE. This follows the scikit-learn convention, -1 meaning all processors, -2 meaning all but one, etc.
  • callbacks (Callable[[int, float, np.ndarray] -> bool]) – Callbacks, which will be run every callbacks_every_iters iterations.
  • callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
  • optimizer (gradient_descent) – Optionally, an existing optimizer can be used for optimization. This is useful for keeping momentum gains between different calls to optimize().
kl_divergence

The KL divergence or error of the embedding.

Type:float
optimize(n_iter, inplace=False, propagate_exception=False, **gradient_descent_params)[source]

Run optmization on the embedding for a given number of steps.

Parameters:
  • n_iter (int) – The number of optimization iterations.
  • learning_rate (Union[str, float]) – The learning rate for t-SNE optimization. When learning_rate="auto" the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for t-distributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.
  • exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
  • momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
  • negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller data sets, the Barnes-Hut approximation is appropriate and can be set using one of the following aliases: bh, BH or barnes-hut. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases: fft, FFT or ìnterpolation.
  • theta (float) – This is the trade-off parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
  • n_interpolation_points (int) – Only used when negative_gradient_method="fft" or its other aliases. The number of interpolation points to use within each grid cell for interpolation based t-SNE. It is highly recommended leaving this value at the default 3.
  • min_num_intervals (int) – Only used when negative_gradient_method="fft" or its other aliases. The minimum number of grid cells to use, regardless of the ints_in_interval parameter. Higher values provide more accurate gradient estimations.
  • inplace (bool) – Whether or not to create a copy of the embedding or to perform updates inplace.
  • propagate_exception (bool) – The optimization process can be interrupted using callbacks. This flag indicates whether we should propagate that exception or to simply stop optimization and return the resulting embedding.
  • random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikit-learn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
  • n_jobs (int) – The number of threads to use while running t-SNE. This follows the scikit-learn convention, -1 meaning all processors, -2 meaning all but one, etc.
  • callbacks (Callable[[int, float, np.ndarray] -> bool]) – Callbacks, which will be run every callbacks_every_iters iterations.
  • callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
Returns:

An optimized partial t-SNE embedding.

Return type:

PartialTSNEEmbedding

Raises:

OptimizationInterrupt – If a callback stops the optimization and the propagate_exception flag is set, then an exception is raised.

exception openTSNE.OptimizationInterrupt(error, final_embedding)[source]

Optimization was interrupted by a callback.

Parameters:
  • error (float) – The KL divergence of the embedding.
  • final_embedding (Union[TSNEEmbedding, PartialTSNEEmbedding]) – Is either a partial or full embedding, depending on where the error was raised.