The smartest non-human primates, like Kanzi the bonobo and Koko the gorilla, understand about 2,000 to 4,000 words. Koko can make about 1,000 signs in sign language and Kanzi can use about 450 lexigrams (pictures that stand for words.) Koko can also make some onomatopoetic words–that is, she can make and use imitative sounds in conversation.
A four year human knows about 4,000 words, similar to an exceptional gorilla. An adult knows about 20,000-35,000 words. (Another study puts the upper bound at 42,000.)
Somewhere along our journey from ape-like hominins to homo sapiens sapiens, our ancestors began talking, but exactly when remains a mystery. The origins of writing have been amusingly easy to discover, because early writers were fond of very durable surfaces, like clay, stone, and bone. Speech, by contrast, evaporates as soon as it is heard–leaving no trace for archaeologists to uncover.
But we can find the things necessary for speech and the things for which speech, in turn, is necessary.
The main reason why chimps and gorillas, even those taught human language, must rely on lexigrams or gestures to communicate is that their voiceboxes, lungs, and throats work differently than ours. Their semi-arborial lifestyle requires using the ribs as a rigid base for the arm and shoulder muscles while climbing, which in turn requires closing the lungs while climbing to provide support for the ribs.
Full bipedalism released our early ancestors from the constraints on airway design imposed by climbing, freeing us to make a wider variety of vocalizations.
Now is the perfect time to break out my file of relevant human evolution illustrations:

We humans split from our nearest living ape relatives about 7-8 million years ago, but true bipedalism may not have evolved for a few more million years. Since there are many different named hominins, here is a quick guide:

Australopithecines (light blue in the graph,) such as the famous Lucy, are believed to have been the first fully bipedal hominins, although, based on the shape of their toes, they may have still occasionally retreated into the trees. They lived between 4 and 2 million years ago.
Without delving into the myriad classification debates along the lines of “should we count this set of skulls as a separate species or are they all part of the natural variation within one species,” by the time the homo genus arises with H Habilis or H. Rudolfensis around 2.8 million years ag, humans were much worse at climbing trees.
Interestingly, one direction humans have continued evolving in is up.

The reliable production of stone tools represents an enormous leap forward in human cognition. The first known stone tools–Oldowan–are about 2.5-2.6 million years old and were probably made by homo Habilis. These simple tools are typically shaped only one one side.
By the Acheulean–1.75 million-100,000 years ago–tool making had become much more sophisticated. Not only did knappers shape both sides of both the tops and bottoms of stones, but they also made tools by first shaping a core stone and then flaking derivative pieces from it.
The first Acheulean tools were fashioned by h Erectus; by 100,000 years ago, h Sapiens had presumably taken over the technology.
Flint knapping is surprisingly difficult, as many an archaeology student has discovered.
These technological advances were accompanied by steadily increasing brain sizes.
I propose that the complexities of the Acheulean tool complex required some form of language to facilitate learning and teaching; this gives us a potential lower bound on language around 1.75 million years ago. Bipedalism gives us an upper bound around 4 million years ago, before which our voice boxes were likely more restricted in the sounds they could make.
A Different View
Even though “homo Sapiens” has been around for about 300,000 years (or so we have defined the point where we chose to differentiate between our species and the previous one,) “behavioral modernity” only emerged around 50,000 years ago (very awkward timing if you know anything about human dispersal.)
Everything about behavioral modernity is heavily contested (including when it began,) but no matter how and when you date it, compared to the million years or so it took humans to figure out how to knap the back side of a rock, human technologic advance has accelerated significantly over the past 100,000 and even moreso over the past 50,000 and even 10,000.
Fire was another of humanity’s early technologies:
Claims for the earliest definitive evidence of control of fire by a member of Homo range from 1.7 to 0.2 million years ago (Mya).[1] Evidence for the controlled use of fire by Homo erectus, beginning some 600,000 years ago, has wide scholarly support.[2][3] Flint blades burned in fires roughly 300,000 years ago were found near fossils of early but not entirely modern Homo sapiens in Morocco.[4] Evidence of widespread control of fire by anatomically modern humans dates to approximately 125,000 years ago.[5]
What prompted this sudden acceleration? Noam Chomsky suggests that it was triggered by the evolution of our ability to use and understand language:
Noam Chomsky, a prominent proponent of discontinuity theory, argues that a single chance mutation occurred in one individual in the order of 100,000 years ago, installing the language faculty (a component of the mind–brain) in “perfect” or “near-perfect” form.[6]
(Pumpkin Person has more on Chomsky.)
More specifically, we might say that this single chance mutation created the capacity for figurative or symbolic language, as clearly apes already have the capacity for very simple language. It was this ability to convey abstract ideas, then, that allowed humans to begin expressing themselves in other abstract ways, like cave painting.
I disagree with this view on the grounds that human groups were already pretty widely dispersed by 100,000 years ago. For example, Pygmies and Bushmen are descended from groups of humans who had already split off from the rest of us by then, but they still have symbolic language, art, and everything else contained in the behavioral modernity toolkit. Of course, if a trait is particularly useful or otherwise successful, it can spread extremely quickly (think lactose tolerance,) and neither Bushmen nor Pygmies were 100% genetically isolated for the past 250,000 years, but I simply think the math here doesn’t work out.
However, that doesn’t mean Chomsky isn’t on to something. For example, Johanna Nichols (another linguist,) used statistical models of language differentiation to argue that modern languages split around 100,000 years ago.[31] This coincides neatly with the upper bound on the Out of Africa theory, suggesting that Nichols may actually have found the point when language began differentiating because humans left Africa, or perhaps she found the origin of the linguistic skills necessary to accomplish humanity’s cross-continental trek.
Philip Lieberman and Robert McCarthy looked at the shape of Neanderthal, homo Erectus, early h Sapiens and modern h Sapiens’ vocal tracts:
In normal adults these two portions of the SVT form a right angle to one another and are approximately equal in length—in a 1:1 proportion. Movements of the tongue within this space, at its midpoint, are capable of producing tenfold changes in the diameter of the SVT. These tongue maneuvers produce the abrupt diameter changes needed to produce the formant frequencies of the vowels found most frequently among the world’s languages—the “quantal” vowels [i], [u], and [a] of the words “see,” “do,” and “ma.” In contrast, the vocal tracts of other living primates are physiologically incapable of producing such vowels.
(Since juvenile humans are shaped differently than adults, they pronounce sounds slightly differently until their voiceboxes fully develop.)
Their results:
…Neanderthal necks were too short and their faces too long to have accommodated equally proportioned SVTs. Although we could not reconstruct the shape of the SVT in the Homo erectus fossil because it does not preserve any cervical vertebrae, it is clear that its face (and underlying horizontal SVT) would have been too long for a 1:1 SVT to fit into its head and neck. Likewise, in order to fit a 1:1 SVT into the reconstructed Neanderthal anatomy, the larynx would have had to be positioned in the Neanderthal’s thorax, behind the sternum and clavicles, much too low for effective swallowing. …
Surprisingly, our reconstruction of the 100,000-year-old specimen from Israel, which is anatomically modern in most respects, also would not have been able to accommodate a SVT with a 1:1 ratio, albeit for a different reason. … Again, like its Neanderthal relatives, this early modern human probably had an SVT with a horizontal dimension longer than its vertical one, translating into an inability to reproduce the full range of today’s human speech.
It was only in our reconstruction of the most recent fossil specimens—the modern humans postdating 50,000 years— that we identified an anatomy that could have accommodated a fully modern, equally proportioned vocal tract.
Just as small children who can’t yet pronounce the letter “r” can nevertheless make and understand language, I don’t think early humans needed to have all of the same sounds as we have in order to communicate with each other. They would have just used fewer sounds.
The change in our voiceboxes may not have triggered the evolution of language, but been triggered by language itself. As humans began transmitting more knowledge via language, humans who could make more sounds could utter a greater range of words perhaps had an edge over their peers–maybe they were seen as particularly clever, or perhaps they had an easier time organizing bands of hunters and warriors.
One of the interesting things about human language is that it is clearly simultaneously cultural–which language you speak is entirely determined by culture–and genetic–only humans can produce language in the way we do. Even the smartest chimps and dolphins cannot match our vocabularies, nor imitate our sounds. Human infants–unless they have some form of brain damage–learn language instinctually, without conscious teaching. (Insert reference to Steven Pinker.)
Some kind of genetic changes were obviously necessary to get from apes to human language use, but exactly what remains unclear.
A variety of genes are associated with language use, eg FOXP2. H Sapiens and chimps have different versions of the FOXP2 gene, (and Neanderthals have a third, but more similar to the H Sapiens version than the chimp,) but to my knowledge we have yet to discover exactly when the necessary mutations arose.
Despite their impressive skulls and survival in a harsh, novel climate, Neanderthals seem not to have engaged in much symbolic activity, (though to be fair, they were wiped out right about the time Sapiens really got going with its symbolic activity.) Homo Sapiens and Homo Nanderthalis split around 800-400,000 years ago–perhaps the difference in our language genes ultimately gave Sapiens the upper hand.
Just as farming appears to have emerged relatively independently in several different locations around the world at about the same time, so behavioral modernity seems to have taken off in several different groups around the same time. Of course we can’t rule out the possibility that these groups had some form of contact with each other–peaceful or otherwise–but it seems more likely to me that similar behaviors emerged in disparate groups around the same time because the cognitive precursors necessary for those behaviors had already begun before they split.
Based on genetics, the shape of their larynges, and their cultural toolkits, Neanderthals probably did not have modern speech, but they may have had something similar to it. This suggests that at the time of the Sapiens-Neanderthal split, our common ancestor possessed some primitive speech capacity.
By the time Sapiens and Neanderthals encountered each other again, nearly half a million years later, Sapiens’ language ability had advanced, possibly due to further modification of FOXP2 and other genes like it, plus our newly modified voiceboxes, while Neanderthals’ had lagged. Sapiens achieved behavioral modernity and took over the planet, while Neanderthals disappeared.