The Well-Tempered Computer

Headphone and speakers

The difference between listening to music using speakers or a headphone is striking.

When listening to speakers:

both ears receive the direct sound from both channels, the sound will mix
for each channel the far ear will receive the sound a fraction of a second later
for each channel each ear will hear the indirect sound as reflected from the walls, floor (bass!) and ceiling.
again the far ear will receive the indirect sound a fraction later

When listening to a headphone, each ear will receive 1 and 1 channel only. No mixing, no delay, no indirect sound coming from the surfaces of the room.

The latter means that the room acoustics are eliminated.

We can localize a sound.

HRTF

If a speaker is off axis, the sound will reach the far-ear a fraction later than the near-ear.

Graph of interaural time differences.
Interaural Time Differences (ITD)

The far ear is in the shadow of the head, so it hears the sound at a slightly lower volume.
This will also affect the frequency response.

Iinteraural Intensity Difference (IID)

These are the two primary cues we use for localization.
This is not enough as the Pinna (the outer ear structure) is needed too but one thing is obvious, all of this won't happen when using a headphone.

This makes why stereo on a headphone sounds like STEREO, the two channels don't mix and all the effects described above, the Head Related Transfer Function (HRTF) won't happen.

Crossfeed

Using DSP one can emulate listening to speakers over a headphone by mixing the channels a bit, adding delay and even emulate the room. This is called the HRTF (Head Related Transfer Function) or crossfeed.

Benjamin B. Bauer was one of the pioneers.
A famous article by him: Stereophonic Earphones and Binaural Loudspeakers.
JAES Volume 9 Number 2 pp. 148-151; April 1961.

Your media player might have one or there is a plug-in.

A VST plug-in and a lot more about headphones can be found here: BlogOhl