GloVe vs. Word2Vec in Practice: Does Global vs. Local Context Really Matter for Your NLP Task?
Exploring Word Embeddings: Comparing Word2Vec and GloVe Models
In the world of word embeddings, Word2Vec and GloVe stand out as two influential approaches for representing words as dense vectors based on their meanings and relationships. Word2Vec relies on predictive modeling, using architectures like Skip-gram and CBOW to learn from local context windows by predicting a word from its neighbors or predicting neighbors from a word. This local, context-driven training enables Word2Vec to capture fine-grained semantic and syntactic patterns. GloVe, in contrast, takes a count-based approach by constructing a global word-word co-occurrence matrix that reflects how frequently words appear together across the entire corpus. It then factorizes this matrix to learn embeddings that encode the global statistical structure of language. While Word2Vec focuses on local prediction tasks, GloVe explicitly captures broader corpus-wide patterns, offering complementary perspectives on how word meanings can be modeled in vector space.
In this post, I will be comparing the two popular methods of word embeddings on various aspects using a pre-trained models. For Word2Vec, I will be using word2vec-google-news-300
which is pre-trained vectors trained on a part of the Google News dataset (about 100 billion words). As for glove, I used glove-wiki-gigaword-100
which is pre-trained glove vectors based on 2 billion tweets with 1.2 million vocabularies.
Note on model size:
Word2Vec:
- Vocabulary Size: 3000000
- Vector Dimension: 300
- Approx Memory: 3433.23 MB
GloVe:
- Vocabulary Size: 400000
- Vector Dimension: 100
- Approx Memory: 152.59 MB
The Word2Vec model has a vocabulary size of 3 million words with 300-dimensional vectors, requiring approximately 3433 MB of memory. In contrast, the GloVe model has a smaller vocabulary of 400,000 words with 100-dimensional vectors, using about 153 MB of memory. This difference means Word2Vec can cover more words and captures richer representations per word but is significantly larger and more memory-intensive compared to GloVe, which is more lightweight and efficient for applications where resource usage is a concern.
Visualization using TSNE in 2D space
Observation: t-SNE visualizations reveal that GloVe excels in capturing global semantic relationships, as evidenced by its tighter clustering of related words like "woman," "king," and "queen," reflecting its co-occurrence-based approach, while Word2Vec, with its local context focus, spreads these gender terms out more but maintains strong linear analogy performance, such as "king - man + woman ≈ queen." Both models effectively group similar concepts like "dog" and "cat" or "apple" and "banana," but GloVe’s handling of broader semantic categories, particularly gender/royalty, is more cohesive, whereas Word2Vec prioritizes local context, leading to looser clusters in the visualization.
Word Similarity
To measure similarity, both model uses cosine similarity between their embedding vectors.
Similarity Score Meaning
Each number shows how similar two words are, based on their vector representations.
Higher scores (closer to 1) mean stronger semantic similarity.
Lower scores (closer to 0) mean weaker or no semantic connection.
The following is the list of arbitrary words defined for measuring similarity:
words = ["king", "queen", "man", "woman", "cat", "dog", "apple", "banana"]
And the similarity scores for each methodologies:
Word2Vec Similarities:
{'king': {'queen': 0.6510956, 'man': 0.2294267, 'woman': 0.12847972, 'cat': 0.12161589, 'dog': 0.12812495, 'apple': 0.10826096, 'banana': 0.1364908}, 'queen': {'king': 0.6510956, 'man': 0.16658205, 'woman': 0.31618136, 'cat': 0.1469276, 'dog': 0.16123557, 'apple': 0.11686957, 'banana': 0.14493662}, 'man': {'king': 0.2294267, 'queen': 0.16658205, 'woman': 0.76640123, 'cat': 0.2991145, 'dog': 0.3088647, 'apple': 0.11685416, 'banana': 0.10060113}, 'woman': {'king': 0.12847972, 'queen': 0.31618136, 'man': 0.76640123, 'cat': 0.32413524, 'dog': 0.3511047, 'apple': 0.12263724, 'banana': 0.10132414}, 'cat': {'king': 0.12161589, 'queen': 0.1469276, 'man': 0.2991145, 'woman': 0.32413524, 'dog': 0.7609457, 'apple': 0.20749082, 'banana': 0.2307203}, 'dog': {'king': 0.12812495, 'queen': 0.16123557, 'man': 0.3088647, 'woman': 0.3511047, 'cat': 0.7609457, 'apple': 0.21969718, 'banana': 0.18583319}, 'apple': {'king': 0.10826096, 'queen': 0.11686957, 'man': 0.11685416, 'woman': 0.12263724, 'cat': 0.20749082, 'dog': 0.21969718, 'banana': 0.5318407}, 'banana': {'king': 0.1364908, 'queen': 0.14493662, 'man': 0.10060113, 'woman': 0.10132414, 'cat': 0.2307203, 'dog': 0.18583319, 'apple': 0.5318407}}
GloVe Similarities:
{'king': {'queen': 0.750769, 'man': 0.5118681, 'woman': 0.36574453, 'cat': 0.32823765, 'dog': 0.29510066, 'apple': 0.26677448, 'banana': 0.16088763}, 'queen': {'king': 0.750769, 'man': 0.4740323, 'woman': 0.5095154, 'cat': 0.38103628, 'dog': 0.3395347, 'apple': 0.2062866, 'banana': 0.19517216}, 'man': {'king': 0.5118681, 'queen': 0.4740323, 'woman': 0.8323494, 'cat': 0.5261841, 'dog': 0.5643127, 'apple': 0.22562249, 'banana': 0.20411581}, 'woman': {'king': 0.36574453, 'queen': 0.5095154, 'man': 0.8323494, 'cat': 0.4783509, 'dog': 0.49806643, 'apple': 0.17487587, 'banana': 0.20301467}, 'cat': {'king': 0.32823765, 'queen': 0.38103628, 'man': 0.5261841, 'woman': 0.4783509, 'dog': 0.87980753, 'apple': 0.27876067, 'banana': 0.27377138}, 'dog': {'king': 0.29510066, 'queen': 0.3395347, 'man': 0.5643127, 'woman': 0.49806643, 'cat': 0.87980753, 'apple': 0.27225733, 'banana': 0.29064262}, 'apple': {'king': 0.26677448, 'queen': 0.2062866, 'man': 0.22562249, 'woman': 0.17487587, 'cat': 0.27876067, 'dog': 0.27225733, 'banana': 0.5054469}, 'banana': {'king': 0.16088763, 'queen': 0.19517216, 'man': 0.20411581, 'woman': 0.20301467, 'cat': 0.27377138, 'dog': 0.29064262, 'apple': 0.5054469}}
Observation: The similarity scores reveal that GloVe generally captures stronger relationships between related word pairs, such as "king" and "queen" or "dog" and "cat," as well as within gender terms like "man" and "woman" or "queen" and "woman," due to its global co-occurrence approach, which aligns with its tighter clustering in the t-SNE plot. Word2Vec, focusing on local context, shows weaker similarities for these pairs but excels in preserving linear analogies like "king - man + woman ≈ queen," as reflected in its t-SNE spread. Both models perform comparably for closely related concepts like "apple" and "banana."
Top Similar Words
Looking at the top 10 similar words from each of the word embeddings, it can be seen that the top 10 are ranked differently.
print(top_similar_words(word2vec_model, 'king'))
[('kings', 0.7138046622276306), ('queen', 0.6510956287384033), ('monarch', 0.6413194537162781), ('crown_prince', 0.6204219460487366), ('prince', 0.6159993410110474), ('sultan', 0.5864824056625366), ('ruler', 0.5797566771507263), ('princes', 0.5646552443504333), ('Prince_Paras', 0.5432944297790527), ('throne', 0.5422105193138123)]
print(top_similar_words(glove_model, 'king'))
[('prince', 0.7682329416275024), ('queen', 0.7507689595222473), ('son', 0.7020888328552246), ('brother', 0.6985775828361511), ('monarch', 0.6977890729904175), ('throne', 0.691999077796936), ('kingdom', 0.6811410188674927), ('father', 0.6802029013633728), ('emperor', 0.6712858080863953), ('ii', 0.6676074266433716)]
Observation: Word2Vec focuses on local context, producing similar words like "queen," "prince," and "monarch," emphasizing direct semantic relationships within the context of "king." In contrast, GloVe captures broader global co-occurrence, linking "king" to a wider range of words, such as "son," "father," and "kingdom," reflecting its ability to associate terms from different contexts.
As a side note, it's interesting that Prince_Paras,
a Nepali prince, ranked among the top similarities in Word2Vec. Given that there are few monarchies left in the world and much of the news focuses on the remaining kingdoms, including Nepal, this connection makes sense.
Embeddings for Rare or OOV Words
To assess whether GloVe's global context generalizes better to rare and unseen words compared to Word2Vec's local context, here's a list of arbitrary words I used for testing.
test_words = [
"econometrics", "bioluminescence", "qwertyuiop", "doggo", "favouritee",
"quantum", "neural", "machinelearning", "bioinformatics", "metamorphosis",
"autonomous", "selfie", "fomo", "yolo", "cryptocurrency", "sustainability",
"nanotechnology", "franglais", "deepfake", "cryptid", "glutenfree",
"bokeh", "flexitarian", "eudca", "berkshire", "machine", "tensor",
"rpa", "coronavirus", "metaverse"
]
Guess how the models performed?
'econometrics' is in both Word2Vec and GloVe.
'bioluminescence' is in both Word2Vec and GloVe.
'qwertyuiop' is in neither Word2Vec nor GloVe.
'doggo' is in neither Word2Vec nor GloVe.
'favouritee' is in neither Word2Vec nor GloVe.
'quantum' is in both Word2Vec and GloVe.
'neural' is in both Word2Vec and GloVe.
'machinelearning' is in neither Word2Vec nor GloVe.
'bioinformatics' is in both Word2Vec and GloVe.
'metamorphosis' is in both Word2Vec and GloVe.
'autonomous' is in both Word2Vec and GloVe.
'selfie' is in neither Word2Vec nor GloVe.
'fomo' is in neither Word2Vec nor GloVe.
'yolo' is in GloVe but not in Word2Vec.
'cryptocurrency' is in neither Word2Vec nor GloVe.
'sustainability' is in both Word2Vec and GloVe.
'nanotechnology' is in both Word2Vec and GloVe.
'franglais' is in GloVe but not in Word2Vec.
'deepfake' is in neither Word2Vec nor GloVe.
'cryptid' is in both Word2Vec and GloVe.
'glutenfree' is in neither Word2Vec nor GloVe.
'bokeh' is in both Word2Vec and GloVe.
'flexitarian' is in Word2Vec but not in GloVe.
'eudca' is in neither Word2Vec nor GloVe.
'berkshire' is in both Word2Vec and GloVe.
'machine' is in both Word2Vec and GloVe.
'tensor' is in both Word2Vec and GloVe.
'rpa' is in GloVe but not in Word2Vec.
'coronavirus' is in both Word2Vec and GloVe.
'metaverse' is in both Word2Vec and GloVe.
Observation: The test shows that both Word2Vec and GloVe capture common terms like 'econometrics,' 'bioluminescence,' and 'quantum,' indicating strong generalization in both models. GloVe, with its global context, includes words like 'yolo' and 'franglais,' which may reflect its ability to generalize better to rarer terms. Word2Vec, relying on local context, misses these terms but includes others like 'flexitarian' and 'emoji.' Both models miss newer or niche terms like 'cryptocurrency' and 'deepfake,' suggesting limitations in capturing emerging vocabulary. Overall, GloVe seems to handle broader vocabulary, while Word2Vec focuses on more contextually grounded terms.
Capturing Semantic vs. Syntactic Relationships
The point of this test is to evaluate how well Word2Vec and GloVe capture different types of word relationships, including semantic and syntactic, based on their underlying models.
# Semantic test
print("Semantic Analogy (king - man + woman):")
print("Word2Vec:", analogy(word2vec_model, positive=['woman', 'king'], negative=['man']))
print("GloVe:", analogy(glove_model, positive=['woman', 'king'], negative=['man']))
Word2Vec: [('queen', 0.7118192911148071), ('monarch', 0.6189674735069275), ('princess', 0.5902431011199951), ('crown_prince', 0.5499460697174072), ('prince', 0.5377321243286133)]
GloVe: [('queen', 0.7698541283607483), ('monarch', 0.6843380331993103), ('throne', 0.6755736470222473), ('daughter', 0.6594556570053101), ('princess', 0.6520534157752991)]
# Syntactic test
print("Syntactic Analogy (walking - walk + swim):")
print("Word2Vec:", analogy(word2vec_model, positive=['walking', 'swim'], negative=['walk']))
print("GloVe:", analogy(glove_model, positive=['walking', 'swim'], negative=['walk']))
Word2Vec: [('swimming', 0.8245975375175476), ('swam', 0.6806817650794983), ('swims', 0.6538287997245789), ('swimmers', 0.649475634098053), ('paddling', 0.6423786282539368)]
GloVe: [('swimming', 0.8008804321289062), ('surfing', 0.6602576971054077), ('swam', 0.6447523832321167), ('rowing', 0.6412529945373535), ('jogging', 0.637580931186676)]
Observation: The test shows that Word2Vec is marginally better than GloVe at capturing syntactic relationships, such as verb tense (e.g., "walking - walk + swim ≈ swimming"), likely due to its local context window approach. However, both models perform similarly well on semantic relationships (e.g., "king - man + woman ≈ queen"). While Word2Vec has a slight edge in syntax, the difference between the two models is not substantial, suggesting that GloVe is still quite capable of handling both semantic and syntactic relationships to a similar degree considering the model size
In conclusion, both Word2Vec and GloVe excel in capturing semantic relationships, as seen in their strong performance on analogies like "king - man + woman ≈ queen," with Word2Vec having a slight edge in syntactic tasks such as "walking - walk + swim ≈ swimming" due to its neural network-based local context approach. However, GloVe’s performance is remarkably impressive, achieving tighter t-SNE clustering, stronger similarity scores for related word pairs, and better coverage of rare terms like "yolo" and "franglais," all through a non-neural matrix factorization method that leverages global co-occurrence patterns. This underscores the surprising effectiveness of GloVe’s count-based approach, which not only competes closely with Word2Vec’s predictive model in both semantic and syntactic tasks but also proves more efficient and better suited for resource-constrained tasks due to its smaller model size, with the potential for even greater performance in larger GloVe models.