Turning a single photo into a video

Sometimes photos cannot truly capture a scene.


Thanks to a new method that gives a single photo a cinematic experience. Developed by scientists from the University of Washington, the technique uses deep learning algorithms to convert a single photo into a video.

For example, if you have given a sample picture of the tree, the system creates a video showing the movement of leaves.

This method creates a seamless loop video that appears as a continuous motion.

Lead author Aleksander Holynski, a doctoral student in the Paul G. Allen School of Computer Science & Engineering, said, “A picture captures a moment frozen in time. But a lot of information is lost in a static image. What led to this moment, and how are things changing? Think about the last time you found yourself fixated on something really interesting — chances are, it wasn’t static.”

“What’s special about our method is that it doesn’t require any user input or extra information. All you need is a picture. And it produces as output a high-resolution, seamlessly looping video that quite often looks like a real video.”

Developing such a method requires future predictions.

The method consists of two parts: First, it predicts how things were moving when a photo was taken and then uses that information to create the animation.

For testing, scientists used the picture of a waterfall. Then, they estimated the motion by training the neural network with thousands of videos of waterfalls, rivers, oceans, and other materials with fluid motion.

This training process involves asking the network to guess the motion of a video. Once the network predicted the motion, the predictions were compared to the real video. Next, the network learned to identify clues — ripples in a stream, for example — to help it predict what happened next. Then, the team’s system uses that information to determine if and how each pixel should move.

By creating a method called symmetric splatting, the method predicts both the future and the past for an image and then combines them into one animation.

Holynski said, “Looking back at the waterfall example if we move into the past, the pixels will move up the waterfall. So we will start to see a hole near the bottom. We integrate information from both of these animations, so there are never any glaringly large holes in our warped images.”

The system follows some cool tricks to keep things clean, including transitioning different parts of the frame at different times and deciding how quickly or slowly to blend each pixel depending on its surroundings.

Holynski said“When we see a waterfall, we know how the water should behave. The same is true for fire or smoke. These motions obey the same set of physical laws, and there are usually cues in the image that tells us how things should be moving. We’d love to extend our work to operate on a wider range of objects, like animating a person’s hair blowing in the wind. I’m hoping that eventually, the pictures that we share with our friends and family won’t be static images. Instead, they’ll all be dynamic animations like the ones our method produces.”

Scientists will present this approach on June 22 at the Conference on Computer Vision and Pattern Recognition.


See stories of the future in your inbox each morning.