API Reference¶

class
openTSNE.
TSNE
(n_components=2, perplexity=30, learning_rate='auto', early_exaggeration_iter=250, early_exaggeration=12, n_iter=500, exaggeration=None, dof=1, theta=0.5, n_interpolation_points=3, min_num_intervals=50, ints_in_interval=1, initialization='pca', metric='euclidean', metric_params=None, initial_momentum=0.5, final_momentum=0.8, max_grad_norm=None, max_step_norm=5, n_jobs=1, neighbors='auto', negative_gradient_method='auto', callbacks=None, callbacks_every_iters=50, random_state=None, verbose=False)[source]¶ tDistributed Stochastic Neighbor Embedding.
Please see the Parameter guide for more information.
Parameters:  n_components (int) – The dimension of the embedding space. This deafults to 2 for easy visualization, but sometimes 1 is used for tSNE heatmaps. tSNE is not designed to embed into higher dimension and please note that acceleration schemes break down and are not fully implemented.
 perplexity (float) – Perplexity can be thought of as the continuous \(k\) number of nearest neighbors, for which tSNE will attempt to preserve distances.
 learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for Tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.  early_exaggeration_iter (int) – The number of iterations to run in the early exaggeration phase.
 early_exaggeration (float) – The exaggeration factor to use during the early exaggeration phase. Typical values range from 12 to 32.
 n_iter (int) – The number of iterations to run in the normal optimization regime.
 exaggeration (float) – The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.
 dof (float) – Degrees of freedom as described in Kobak et al. “Heavytailed kernels reveal a finer cluster structure in tSNE visualisations”, 2019.
 theta (float) – Only used when
negative_gradient_method="bh"
or its other aliases. This is the tradeoff parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime. Alternatively, you can useauto
to approximately select the faster method.  n_interpolation_points (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The number of interpolation points to use within each grid cell for interpolation based tSNE. It is highly recommended leaving this value at the default 3.  min_num_intervals (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The minimum number of grid cells to use, regardless of theints_in_interval
parameter. Higher values provide more accurate gradient estimations.  ints_in_interval (float) – Only used when
negative_gradient_method="fft"
or its other aliases. Indicates how large a grid cell should be e.g. a value of 3 indicates a grid side length of 3. Lower values provide more accurate gradient estimations.  initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can be a
precomputed numpy array,
pca
,spectral
orrandom
. Please note that when passing in a precomputed positions, it is highly recommended that the point positions have small variance (std(Y) < 0.0001), otherwise you may get poor embeddings.  metric (Union[str, Callable]) – The metric to be used to compute affinities between points in the original space.
 metric_params (dict) – Additional keyword arguments for the metric function.
 initial_momentum (float) – The momentum to use during the early exaggeration phase.
 final_momentum (float) – The momentum to use during the normal optimization phase.
 max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
 max_step_norm (float) – Maximum update norm. If the norm exceeds this value, it will be clipped. This prevents points from “shooting off” from the embedding.
 n_jobs (int) – The number of threads to use while running tSNE. This follows the
scikitlearn convention,
1
meaning all processors,2
meaning all but one, etc.  neighbors (str) – Specifies the nearest neighbor method to use. Can be
exact
,annoy
,pynndescent
,hnsw
,approx
, orauto
(default).approx
uses Annoy if the input data matrix is not a sparse object and if Annoy supports the given metric. Otherwise it uses Pynndescent.auto
uses exact nearest neighbors for N<1000 and the same heuristic asapprox
for N>=1000.  negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller
data sets, the BarnesHut approximation is appropriate and can be set
using one of the following aliases:
bh
,BH
orbarneshut
. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases:fft
,FFT
orìnterpolation
. Alternatively, you can useauto
to approximately select the faster method.  callbacks (Union[Callable, List[Callable]]) – Callbacks, which will be run every
callbacks_every_iters
iterations.  callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
 random_state (Union[int, RandomState]) – If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
 verbose (bool) –

fit
(X=None, affinities=None, initialization=None)[source]¶ Fit a tSNE embedding for a given data set.
Runs the standard tSNE optimization, consisting of the early exaggeration phase and a normal optimization phase.
This function call be called in two ways. 1. We can call it in the standard way using a
np.array
. This willcompute the affinity matrix and initialization, and run the optimization as usual. We can also pass in a precomputed
affinity
object, which will override the affinityrelated paramters specified in the constructor. This is useful when you wish to use custom affinity objects.
Please note that some initialization schemes require
X
to be specified, e.g. PCA. If the initilization is not able to be computed, we default to using spectral initilization calculated from the affinity matrix.Parameters:  X (Optional[np.ndarray}) – The data matrix to be embedded.
 affinities (Optional[openTSNE.affinity.Affinities]) – A precomputed affinity object. If specified, other affinityrelated parameters are ignored e.g. perplexity and anything nearestneighbor search related.
 initialization (Optional[np.ndarray]) – The initial point positions to be used in the embedding space. Can be
a precomputed numpy array,
pca
,spectral
orrandom
. Please note that when passing in a precomputed positions, it is highly recommended that the point positions have small variance (std(Y) < 0.0001), otherwise you may get poor embeddings.
Returns: A fully optimized tSNE embedding.
Return type:  We can also pass in a precomputed

prepare_initial
(X=None, affinities=None, initialization=None)[source]¶ Prepare the initial embedding which can be optimized as needed.
This function call be called in two ways. 1. We can call it in the standard way using a
np.array
. This willcompute the affinity matrix and initialization as usual. We can also pass in a precomputed
affinity
object, which will override the affinityrelated paramters specified in the constructor. This is useful when you wish to use custom affinity objects.
Please note that some initialization schemes require
X
to be specified, e.g. PCA. If the initilization is not able to be computed, we default to using spectral initilization calculated from the affinity matrix.Parameters:  X (Optional[np.ndarray}) – The data matrix to be embedded.
 affinities (Optional[openTSNE.affinity.Affinities]) – A precomputed affinity object. If specified, other affinityrelated parameters are ignored e.g. perplexity and anything nearestneighbor search related.
 initialization (Optional[np.ndarray]) – The initial point positions to be used in the embedding space. Can be
a precomputed numpy array,
pca
,spectral
orrandom
. Please note that when passing in a precomputed positions, it is highly recommended that the point positions have small variance (std(Y) < 0.0001), otherwise you may get poor embeddings.
Returns: An unoptimized
TSNEEmbedding
object, prepared for optimization.Return type:  We can also pass in a precomputed

class
openTSNE.
TSNEEmbedding
[source]¶ A tSNE embedding.
Please see the Parameter guide for more information.
Parameters:  embedding (np.ndarray) – Initial positions for each data point.
 affinities (Affinities) – An affinity index which can be used to compute the affinities of new points to the points in the existing embedding. The affinity index also contains the affinity matrix \(P\) used during optimization.
 learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for Tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.  exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
 dof (float) – Degrees of freedom as described in Kobak et al. “Heavytailed kernels reveal a finer cluster structure in tSNE visualisations”, 2019.
 momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
 negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller
data sets, the BarnesHut approximation is appropriate and can be set
using one of the following aliases:
bh
,BH
orbarneshut
. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases:fft
,FFT
orìnterpolation
.A lternatively, you can useauto
to approximately select the faster method.  theta (float) – This is the tradeoff parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
 n_interpolation_points (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The number of interpolation points to use within each grid cell for interpolation based tSNE. It is highly recommended leaving this value at the default 3.  min_num_intervals (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The minimum number of grid cells to use, regardless of theints_in_interval
parameter. Higher values provide more accurate gradient estimations.  random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikitlearn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
 n_jobs (int) – The number of threads to use while running tSNE. This follows the
scikitlearn convention,
1
meaning all processors,2
meaning all but one, etc.  callbacks (Callable[[int, float, np.ndarray] > bool]) – Callbacks, which will be run every
callbacks_every_iters
iterations.  callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
 optimizer (gradient_descent) – Optionally, an existing optimizer can be used for optimization. This is
useful for keeping momentum gains between different calls to
optimize()
.

kl_divergence
¶ The KL divergence or error of the embedding.
Type: float

optimize
(n_iter, inplace=False, propagate_exception=False, **gradient_descent_params)[source]¶ Run optmization on the embedding for a given number of steps.
Please see the Parameter guide for more information.
Parameters:  n_iter (int) – The number of optimization iterations.
 learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019.  exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
 dof (float) – Degrees of freedom as described in Kobak et al. “Heavytailed kernels reveal a finer cluster structure in tSNE visualisations”, 2019.
 momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
 negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For
smaller data sets, the BarnesHut approximation is appropriate and
can be set using one of the following aliases:
bh
,BH
orbarneshut
. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases:fft
,FFT
orìnterpolation
. Alternatively, you can useauto
to approximately select the faster method.  theta (float) – This is the tradeoff parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
 n_interpolation_points (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The number of interpolation points to use within each grid cell for interpolation based tSNE. It is highly recommended leaving this value at the default 3.  min_num_intervals (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The minimum number of grid cells to use, regardless of theints_in_interval
parameter. Higher values provide more accurate gradient estimations.  inplace (bool) – Whether or not to create a copy of the embedding or to perform updates inplace.
 propagate_exception (bool) – The optimization process can be interrupted using callbacks. This flag indicates whether we should propagate that exception or to simply stop optimization and return the resulting embedding.
 max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
 max_step_norm (float) – Maximum update norm. If the norm exceeds this value, it will be clipped. This prevents points from “shooting off” from the embedding.
 random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikitlearn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
 n_jobs (int) – The number of threads to use while running tSNE. This follows the
scikitlearn convention,
1
meaning all processors,2
meaning all but one, etc.  callbacks (Callable[[int, float, np.ndarray] > bool]) – Callbacks, which will be run every
callbacks_every_iters
iterations.  callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
Returns: An optimized tSNE embedding.
Return type: Raises: OptimizationInterrupt
– If a callback stops the optimization and thepropagate_exception
flag is set, then an exception is raised.

prepare_interpolation_grid
(padding=0.25)[source]¶ Evaluate and save the interpolation grid coefficients.
Parameters: padding (float) – During standard optimization, the grid hugs the embedding points as closely as possible, but this is not what we want when performing transform. This paraemter specifies how much empty space should be appended in each dimension. The values are given in percentages.

prepare_partial
(X, initialization='median', k=25, **affinity_params)[source]¶ Prepare a partial embedding which can be optimized.
Parameters:  X (np.ndarray) – The data matrix to be added to the existing embedding.
 initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can
be a precomputed numpy array,
median
,weighted
orrandom
. In all cases,median
ofweighted
should be preferred.  k (int) – The number of nearest neighbors to consider when initially placing
the point onto the embedding. This is different from
perpelxity
because perplexity affects optimization while this only affects the initial point positions.  **affinity_params (dict) – Additional params to be passed to the
Affinities.to_new
method. Please see individualAffinities
implementations as the parameters differ between implementations.
Returns: An unoptimized
PartialTSNEEmbedding
object, prepared for optimization.Return type:

transform
(X, perplexity=5, initialization='median', k=25, learning_rate=0.1, early_exaggeration=4, early_exaggeration_iter=0, exaggeration=1.5, n_iter=250, initial_momentum=0.5, final_momentum=0.8, max_grad_norm=0.25, max_step_norm=None)[source]¶ Embed new points into the existing embedding.
This procedure optimizes each point only with respect to the existing embedding i.e. it ignores any interactions between the points in
X
among themselves.Please see the Parameter guide for more information.
Parameters:  X (np.ndarray) – The data matrix to be added to the existing embedding.
 perplexity (float) – Perplexity can be thought of as the continuous \(k\) number of nearest neighbors, for which tSNE will attempt to preserve distances. However, when transforming, we only consider neighbors in the existing embedding i.e. each data point is placed into the embedding, independently of other new data points.
 initialization (Union[np.ndarray, str]) – The initial point positions to be used in the embedding space. Can
be a precomputed numpy array,
median
,weighted
orrandom
. In all cases,median
ofweighted
should be preferred.  k (int) – The number of nearest neighbors to consider when initially placing
the point onto the embedding. This is different from
perpelxity
because perplexity affects optimization while this only affects the initial point positions.  learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.  early_exaggeration_iter (int) – The number of iterations to run in the early exaggeration phase.
 early_exaggeration (float) – The exaggeration factor to use during the early exaggeration phase. Typical values range from 12 to 32.
 n_iter (int) – The number of iterations to run in the normal optimization regime.
 exaggeration (float) – The exaggeration factor to use during the normal optimization phase. This can be used to form more densely packed clusters and is useful for large data sets.
 initial_momentum (float) – The momentum to use during the early exaggeration phase.
 final_momentum (float) – The momentum to use during the normal optimization phase.
 max_grad_norm (float) – Maximum gradient norm. If the norm exceeds this value, it will be clipped. This is most beneficial when adding points into an existing embedding and the new points overlap with the reference points, leading to large gradients. This can make points “shoot off” from the embedding, causing the interpolation method to compute a very large grid, and leads to worse results.
 max_step_norm (float) – Maximum update norm. If the norm exceeds this value, it will be clipped. This prevents points from “shooting off” from the embedding.
Returns: The positions of the new points in the embedding space.
Return type:

class
openTSNE.
PartialTSNEEmbedding
[source]¶ A partial tSNE embedding.
A partial embedding is created when we take an existing
TSNEEmbedding
and embed new samples into the embedding space. It differs from the typical embedding in that it is not possible to add new samples to a partial embedding and would generally be a bad idea.Please see the Parameter guide for more information.
Parameters:  embedding (np.ndarray) – Initial positions for each data point.
 reference_embedding (TSNEEmbedding) – The embedding into which the new samples are to be added.
 P (array_like) – An \(N \times M\) affinity matrix containing the affinities from each new data point \(n\) to each data point in the existing embedding \(m\).
 learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12) as determined in Belkina et al. (2019), Nature Communications. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.  exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
 momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
 negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For smaller
data sets, the BarnesHut approximation is appropriate and can be set
using one of the following aliases:
bh
,BH
orbarneshut
. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases:fft
,FFT
orìnterpolation
. Alternatively, you can useauto
to approximately select the faster method.  theta (float) – This is the tradeoff parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
 n_interpolation_points (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The number of interpolation points to use within each grid cell for interpolation based tSNE. It is highly recommended leaving this value at the default 3.  min_num_intervals (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The minimum number of grid cells to use, regardless of theints_in_interval
parameter. Higher values provide more accurate gradient estimations.  random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikitlearn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
 n_jobs (int) – The number of threads to use while running tSNE. This follows the
scikitlearn convention,
1
meaning all processors,2
meaning all but one, etc.  callbacks (Callable[[int, float, np.ndarray] > bool]) – Callbacks, which will be run every
callbacks_every_iters
iterations.  callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
 optimizer (gradient_descent) – Optionally, an existing optimizer can be used for optimization. This is
useful for keeping momentum gains between different calls to
optimize()
.

kl_divergence
¶ The KL divergence or error of the embedding.
Type: float

optimize
(n_iter, inplace=False, propagate_exception=False, **gradient_descent_params)[source]¶ Run optmization on the embedding for a given number of steps.
Parameters:  n_iter (int) – The number of optimization iterations.
 learning_rate (Union[str, float]) – The learning rate for tSNE optimization. When
learning_rate="auto"
the appropriate learning rate is selected according to max(200, N / 12), as determined in Belkina et al. “Automated optimized parameters for tdistributed stochastic neighbor embedding improve visualization and analysis of large datasets”, 2019. Note that this should not be used when adding samples into existing embeddings, where the learning rate often needs to be much lower to obtain convergence.  exaggeration (float) – The exaggeration factor is used to increase the attractive forces of nearby points, producing more compact clusters.
 momentum (float) – Momentum accounts for gradient directions from previous iterations, resulting in faster convergence.
 negative_gradient_method (str) – Specifies the negative gradient approximation method to use. For
smaller data sets, the BarnesHut approximation is appropriate and
can be set using one of the following aliases:
bh
,BH
orbarneshut
. For larger data sets, the FFT accelerated interpolation method is more appropriate and can be set using one of the following aliases:fft
,FFT
orìnterpolation
. Alternatively, you can useauto
to approximately select the faster method.  theta (float) – This is the tradeoff parameter between speed and accuracy of the tree approximation method. Typical values range from 0.2 to 0.8. The value 0 indicates that no approximation is to be made and produces exact results also producing longer runtime.
 n_interpolation_points (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The number of interpolation points to use within each grid cell for interpolation based tSNE. It is highly recommended leaving this value at the default 3.  min_num_intervals (int) – Only used when
negative_gradient_method="fft"
or its other aliases. The minimum number of grid cells to use, regardless of theints_in_interval
parameter. Higher values provide more accurate gradient estimations.  inplace (bool) – Whether or not to create a copy of the embedding or to perform updates inplace.
 propagate_exception (bool) – The optimization process can be interrupted using callbacks. This flag indicates whether we should propagate that exception or to simply stop optimization and return the resulting embedding.
 random_state (Union[int, RandomState]) – The random state parameter follows the convention used in scikitlearn. If the value is an int, random_state is the seed used by the random number generator. If the value is a RandomState instance, then it will be used as the random number generator. If the value is None, the random number generator is the RandomState instance used by np.random.
 n_jobs (int) – The number of threads to use while running tSNE. This follows the
scikitlearn convention,
1
meaning all processors,2
meaning all but one, etc.  callbacks (Callable[[int, float, np.ndarray] > bool]) – Callbacks, which will be run every
callbacks_every_iters
iterations.  callbacks_every_iters (int) – How many iterations should pass between each time the callbacks are invoked.
Returns: An optimized partial tSNE embedding.
Return type: Raises: OptimizationInterrupt
– If a callback stops the optimization and thepropagate_exception
flag is set, then an exception is raised.

exception
openTSNE.
OptimizationInterrupt
(error, final_embedding)[source]¶ Optimization was interrupted by a callback.
Parameters:  error (float) – The KL divergence of the embedding.
 final_embedding (Union[TSNEEmbedding, PartialTSNEEmbedding]) – Is either a partial or full embedding, depending on where the error was raised.