Word usage can draw the emotional state of Spaceship Earth.
Lesson / Koan : Language models are the real maps of the world..
An n-gram is a subsequence of n items from a given sequence. The items in question can be phonemes, syllables, letters, words or base pairs according to the application.
An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”; and size 4 or more is simply called an “n-gram”
Google’s new tool Ngram Viewer, you can visualise the rise and fall of concepts across 5 million books and 500 years!
What’s all this do?
When you enter phrases into the Google Books Ngram Viewer, it displays a graph showing how those phrases have occurred in a corpus of books (e.g., “British English”, “English Fiction”, “French”) over the selected years. Let’s look at a sample graph:
This shows trends in three ngrams from 1950 to 2000: “nursery school” (a 2-gram or bigram), “kindergarten” (a 1-gram or unigram), and “child care” (another bigram). What the y-axis shows is this: of all the bigrams contained in our sample of books written in English and published in the United States, what percentage of them are “nursery school” or “child care”? Of all the unigrams, what percentage of them are “kindergarten”? Here, you can see that use of the phrase “child care” started to rise in the late 1960s, overtaking “nursery school” around 1970 and then “kindergarten” around 1973. It peaked shortly after 1990 and has been falling steadily since.
Let’s introduce a third party to the Ngram comparison. ( What is our new word revealing per the graph? )
One more try…But, is Utopia and Happiness really this far apart?
Compare English to French.
Per the Ngram graph, 1997 & 1998 must have been good years in France.
Also, could it be that Americans dream of Happiness, but don’t know how it find it or how to make themselves happy?