Text Modeling Visualizers

Yellowbrick provides the yellowbrick.text module for text-specific visualizers. The TextVisualizer class specifically deals with datasets that are corpora and not simple numeric arrays or DataFrames, providing utilities for analyzing word dispersion and distribution, showing document similarity, or simply wrapping some of the other standard visualizers with text-specific display properties.

We currently have five text-specific visualizations implemented:

Token Frequency Distribution: plot the frequency of tokens in a corpus
t-SNE Corpus Visualization: plot similar documents closer together to discover clusters
UMAP Corpus Visualization: plot similar documents closer together to discover clusters
Dispersion Plot: plot the dispersion of target words throughout a corpus
Word Correlation Plot: plot the correlation between target words across the documents in a corpus
PosTag Visualization: plot the counts of different parts-of-speech throughout a tagged corpus

Note that the examples in this section require a corpus of text data, see the hobbies corpus for a sample dataset.

from yellowbrick.text import FreqDistVisualizer
from yellowbrick.text import TSNEVisualizer
from yellowbrick.text import UMAPVisualizer
from yellowbrick.text import DispersionPlot
from yellowbrick.text import WordCorrelationPlot
from yellowbrick.text import PosTagVisualizer

from sklearn.feature_extraction.text import TfidfVectorizer
from sklearn.feature_extraction.text import CountVectorizer