Why people think they like video but they really don’t

In my previous post, I briefly introduced the concept of communication through animated loops and text rather than the traditional video format everyone is familiar with.

I believe that video in the traditional sense is not something that is conducive in personal improvement, encouraging social interactions beyond a person’s safety zone of friendships and pushing the boundaries of the inevitable merge of digital communications into our every day lived bodies.

Short term video interactions and the lack of conversation

With the popularity of online services to share videos (both Snapchat and Vine are good examples) it seems like a great idea to share a small moving clip with audio to your close friends. This works in terms of delivering a direct synchronous moment to someone else - but it lacks conversation.

Here is how this works:

  1. I send you a six second video of me sitting in a park on a Sunday afternoon. I smile at the camera and press send.
  2. You receive this video clip and depending on whether it is ephemeral or not, you can replay for an indefinite period of time or be caught off guard and try to pick up the pieces of that visual and audio moment into some meaningful story.
  3. You decide to reply with another video, except you are in a bar and the music is very loud. You show me the drink you have in your hand and press send.
  4. I receive this video and the audio is jarring and distorted. The video struggles to display a drink in your hand since the lighting in this bar is very dim. Then the video ends.

This back and forth interaction can go on or discontinue. Nothing binds the process from 1 to 4 into anything meaningful except clips of where I am and what I am doing and where you are and what you are doing.

If this process doesn’t work, then what about Skype or Google Hangouts?

Long term video interactions and being “ON”

When you are on a realtime video feed and you have a handful of participants and interactions are not scripted, then you inevitably run into the risk of interaction burnout and social awkwardness.

Assume that a group of ten participants on Google Hangouts are ready to chat with each other. The group intent is that seeing everyone’s faces and hearing everyone’s voices will fulfill all their needs of social intimacy, conversation and bonding.

What actually happens is that the majority of participants will sit back and listen to two or three participants who overrule in being the speaker. This is usually unintentional by those active speakers but the experience can be overwhelming for both the audience and the host (where the host is the current speaker).

Now video lag happens often and some people experience frozen video feeds and miss out on context when someone is speaking. There is no way to replay or loop that moment - even temporarily - so various participants will be out of sync with the conversation, despite being on a realtime feed.

All participants also can’t be on the video feed and feel comfortable at being silent - so somebody always has to speak to keep it going. Imagine all the energy the speaker has to force into participating in order to ensure the situation isn’t awkward.

Unless all participants have a script and the timing is planned, any type of video interaction involving audio will be insufficient and possibly annoying. It is like a movie - if the timing and script is not in the right balance, then the movie will not be good. This is the same for realtime video feeds - if the actors are not ready to act with the correct timing and script, the video will also not be good.

Animated excerpts, text and memory imprints

If the timing of an animation combined with text is executed in a single message, then the results are a lot more memorable than a realtime video clip that is either shorter or longer term.

Imagine having a fixed time for recording yourself on camera - let’s say two seconds to capture your actions. But let’s also include the ability to associate it with text. Then you press send.

The audience now sees a looping image along with your text. They see it again. And again. They associate the text with the looping imagery. Now think about how effective this is in communicating a moment across to others.

Audio is unpredicable but text isn’t. Text is always intentional and the sound is controlled because there is no sound. The visual loop may be unpredictable (e.g. a cat walking past the camera) but removing one of the unpredictable elements increases quality control.

Forcing a predictable timing restriction such as two seconds guarantees that you don’t have to plan too long for what you want to do. It encourages improvisation and creativity. If it is too long, then you are giving the user too much time to plan. If you are giving too much time for the viewer to absorb the incoming message, then they will lose interest in responding because the conversation isn’t succinct.

What does this all mean?

This means that real conversations in a video format cannot happen. They can only happen in situations where everything is brief, memorable and simple.

Movies are not simple and asynchronous. They require the audience to absorb and thus the audience is not encouraged to participate. Videos are a subset of movies and suffer from the same limitations.

If we are to think about how we represent ourselves online and how we want to fulfill our human need to connect with other humans then we need to think about moving the design of our communication to fit the conversation model, not the other way around.