Notes on Malcolm Gladwell's The Tipping Point

Gladwell does not have any mathematics in his book. They say that each equation
halves the sales of a book. But some of us think better mathematically. It makes
things more clear. So I will discuss the book with a few equations and models.

Exponential growth:
Basically the book is about exponential growth processes but complicated ones.
He uses some ideas from simple exponential growth as examples to motivate his
ideas so let's start there.

Let's start with the equation ab, a to the b power,
where a > 1 (otherwise it will get smaller instead of growing)
and b > 1 (if b=1 it won't grow and b < 1 gives negative exponential growth).
This gives rise to the familiar graph of exponential growth.
(See Wikepedia.)

In mathematics we would allow non-integer exponents (values for b) but here we
want to restrict to discrete, generational growth so let's change the equation
to an and require that n = 1, 2, 3, 4, 5, ...
where each value of n is a generation of growth.
This gives rise to the familiar doubling style of exponential growth.
This is often used as an example to impress people about how exponential growth
works with stories of grains of wheat on a chessboard or folding a piece of paper.
(Again see the Wikipedia entry.)
As we know, it starts slowly but gathers speed and after a while grows very quickly.

Networks/Graphs:
This an model is the start of what Gladwell is talking about but we need
a more complicated model to capture the ideas he is presenting.
Almost all of his important examples are networks (that is, graphs) of people
where a link between two people represents a social connection.
So we have a large graph where people are the nodes and a link between two people
means they know each other, come into contact with each other,
or interact with each other in some way.
The number of links to a node can vary widely, some people have many links and
some only a few.
The links are not directional so we can't say that they go into or out of the node.

Gladwell most often talks about "infectious" processes on these graphs, that is,
where some "infection" (maybe an actual infection or maybe an idea or bit of information)
starts in one or more nodes and then, each generation, it moves along the links out
from that node to other nodes.
But the infection doesn't always move along the link.
He talks about various factors (stickiness, mavens, etc.) that affect how likely
the infection is to move along the link.

So the process is one of the infection spreading through the graph in steps,
each step is a generation. This is a complex form of the doubling.
In simple doubling, each node has three links, one the infection comes in on and
two it goes out on, and the infection always goes out on a link.
If you start with one node in the graph infected then each generation the number
of infected nodes will double and you get the equation an.

Networks node parameters:
So for each node there are two parameters: the number of links it has and the
probability that an infection will move along a link.
For example, if that probability was 0.1 then the infection would move along
10% of the links.
Gladwell does not talk about cases where the probability of an infection moving
along a link varies between the links.
This is a level of complexity in the model that he does not need.

So we could rewrite the equation an as
(probInfect*numLinks)n where "probInfect" is the probability an
infection will move along a link and "numLinks" is the number of links out of the node.
Note that 0.0 ≤ probInfect ≤ 1.0.
The (probInfect*numLinks) factor is different for each node and so we can't use
the equation to predict the number of infected nodes after n generations.
We are just using the equation form as an analogy to show how it grows out the
simple exponential growth model.
We could easily write a computer program that would simulate the graph and
provide growth statistics.

We need to add one more factor to the equation:
(probSticks*probInfect*numLinks)n where "probSticks" is a probability
(note that 0.0 ≤ probSticks ≤ 1.0) related to the "stickiness" of the infection.
It means that the infection does not always move along the link.
This stickiness does not vary from node to node.

Gladwell and the network model:
Now we have the notation to describe some of the concepts that Gladwell uses in his book.

  1. A "Connector" is a node with a high "numLinks" value.
  2. A "Maven" is a node with a high (close to 1.0) "probInfect" value.
  3. A "Salesman" is a node with a high (close to 1.0) "probInfect" value.
  4. An "Innovator" is a node where the infection is likely to arise spontaneously.
  5. The "Power of Context" affects the value of "probSticks".

Theoretical work on networks:
Erdos and Renyi did work in "random graphs" some years ago.
There has been a lot of work on non-random networks in the past 10 years.

Due to the Stanley Milgram study and the subsequent play we hear a lot about
"six degrees of separation".
Here are some other estimates:

  1. The average separation of nodes on the web is 19 clicks.
  2. The average separation of chemicals in a cell is 3 chemical reactions.
  3. The average separation of scientists in different fields is 4 to 6 coauthorships.
  4. The average separation of neurons in the brain of C. elegans (a tiny worm) is 14 synapses.

Power laws:
Most networks conform to power laws meaning that the number of nodes with
n links is proportional to n-k.
For example, in the web the number of links, n, incoming to a node is proportional
to n-2.1, that is, roughly a square.
So there are about 1/4 as many nodes with 10 incoming links as there are nodes
with 5 incoming links, and about 1/16 as many nodes with 20 incoming links, etc.
The k for outgoing links is about 2.5.
Think about road maps or airline route maps. These follow power laws in terms of the
number of links each node has.

Scale-free networks:
Networks like this are called scale-free because they look the same at
all levels of details.
(We might note that this is also true of fractals.)

The rich get richer:
As scale-free networks grow new nodes prefer to link to existing nodes that
already have a lot of links.
If you are putting in a new airline route, it is more likely to go from Hobbs
to Albuquerque than Hobbs to Portales.
You are more likely to become friends with a friendly person who already has a
lot of friends than a loner.
If nodes are added using this "the rich get richer" model then you get scale-free
networks where the number of links to a node follows a power law.

Robustness, failures and attacks:
Since well-connected nodes are rare in scale-free networks, they are robust in
the face of random failures.
But they are vulnerable to directed attacks (just attack the big nodes).