Video Processing with AV Foundation
November 18, 2010

UPDATE: The app talked about in this post has been released. More information about Videoscope here.

I've been working on a project that required playing back video in iOS. Seems that most people wanting to play back video are just interested in the basic operation of opening a video and passing off the playback to a MPMoviePlayerController.

MPMoviePlayerController is a really nice way to just point to a URL (either local or networked) and say "go." As with all really simple API layers, there's not much further you can go with it.

What I really wanted to do was to be able to point to a URL (for me, a video saved in the device's Photo Library) and get direct access to the pixel data. New in iOS4.0 are a pile of classes to do just this (and much, much more) in the AVFoundation framework. There is actually quite a bit of great documentation on it, but there are so many classes that need to work together that it's a pretty daunting task to get started.

This post is going to cover just reading in a video track from a specified URL that points to a local QuickTime, however, it should be applicable to other bits of AV Foundation.

First bit of required data is to create a NSURL object. In my case, I was using a UIImagePickerController and retrieving the URL of the movie that was picked by the user:

- (void)imagePickerController:(UIImagePickerController *)picker didFinishPickingMediaWithInfo:(NSDictionary *)info { NSString * mediaType = [info objectForKey:UIImagePickerControllerMediaType]; if ([mediaType isEqualToString:kUTTypeMovie]) [self readMovie:[info objectForKey:UIImagePickerControllerMediaURL]]; [self dismissModalViewControllerAnimated:YES]; }

One note, the kUTTypeMovie constant is defined in the MobileCoreServices framework. Now to actually do something with that URL. I floundered for a while, but finding a combination of blog posts, messageboard attempts, and finally a great page in the iOS reference guide, the AV Foundation Programming Guide : Playback.

This guide is pretty buried. In fact, just now it took me a little while to find it again. Although much of this guide is geared again towards playback into a view, a lot of it is very applicable to general AV Foundation programming. After a lot of trial and error, I came up with this:

- (void) readMovie:(NSURL *)url { AVURLAsset * asset = [AVURLAsset URLAssetWithURL:url options:nil]; [asset loadValuesAsynchronouslyForKeys:[NSArray arrayWithObject:@"tracks"] completionHandler: ^{ dispatch_async(dispatch_get_main_queue(), ^{ AVAssetTrack * videoTrack = nil; NSArray * tracks = [asset tracksWithMediaType:AVMediaTypeVideo]; if ([tracks count] == 1) { videoTrack = [tracks objectAtIndex:0]; NSError * error = nil; // _movieReader is a member variable _movieReader = [[AVAssetReader alloc] initWithAsset:asset error:&error]; if (error) NSLog(error.localizedDescription); NSString* key = (NSString*)kCVPixelBufferPixelFormatTypeKey; NSNumber* value = [NSNumber numberWithUnsignedInt:kCVPixelFormatType_32BGRA]; NSDictionary* videoSettings = [NSDictionary dictionaryWithObject:value forKey:key]; [_movieReader addOutput:[AVAssetReaderTrackOutput assetReaderTrackOutputWithTrack:videoTrack outputSettings:videoSettings]]; [_movieReader startReading]; } }); }]; }

What is this doing? First, we create an AVURLAsset with the given URL. Then we tell that asset to load its tracks asynchronously with a given completionHandler. The completion handler gets called when the track loading completes (presumably in another thread). In this completion handler we dispatch a chunk of instructions to run in the main queue. Honestly, I'm not totally sure that we need to dispatch to the main queue, but the AV Foundation guide said who am I to argue?

When we say to load the "tracks" asynchronously, it's only loading the necessary metadata required to actually start reading the data inside those tracks. In my case the QuickTimes accessed are very simple and probably could be read asynchronously, but if someone opens a 2-hour movie into this, that could take some time.

In any case, once the track data is loaded into the AVURLAsset, we can actually pull out all the tracks from the asset– and we can specify which type of track we care about. In this case, I assume we have one video track, and can safely (sort of) pull out the video track. We can then create our AVAssetReader with the asset and and an output to use the video track that we just found. This AVAssetReaderTrack is specified with both the track and a dictionary of settings. Conveniently the settings are pretty close to those used in AVCaptureVideoDataOutput. That makes me very happy.

Finally, we call the startReading method on the AVAssetReader. The iOS documentation says that the startReading method tells the reader to start preparing samples to be retrieved. In theory it will be reading far enough ahead so that you can do real-time playback. I haven't actually tested that bit, but it does seem to be close to real-time.

Probably a way more-readable way to deal with all this is to put the whole block into a method on my class, then inside the dispatch_async block, just call that method. Oh well. Live and learn. the AVAssetReader is what? We can start requesting sample buffers from it. It's surprisingly straightforward.

- (void) readNextMovieFrame { if (_movieReader.status == AVAssetReaderStatusReading) { AVAssetReaderTrackOutput * output = [_movieReader.outputs objectAtIndex:0]; CMSampleBufferRef sampleBuffer = [output copyNextSampleBuffer]; if (sampleBuffer) { CVImageBufferRef imageBuffer = CMSampleBufferGetImageBuffer(sampleBuffer); // Lock the image buffer CVPixelBufferLockBaseAddress(imageBuffer,0); // Get information of the image uint8_t *baseAddress = (uint8_t *)CVPixelBufferGetBaseAddress(imageBuffer); size_t bytesPerRow = CVPixelBufferGetBytesPerRow(imageBuffer); size_t width = CVPixelBufferGetWidth(imageBuffer); size_t height = CVPixelBufferGetHeight(imageBuffer); // // Here's where you can process the buffer! // (your code goes here) // // Finish processing the buffer! // // Unlock the image buffer CVPixelBufferUnlockBaseAddress(imageBuffer,0); CFRelease(sampleBuffer); } } }

All that is happening here is that we first verify that the AVAssetReader is actually reading (which really means that the reader has a sample buffer that we can grab), then we can get a copy of it, and process it with the CoreMedia and CoreVideo libraries. In this case we end up with an image that is 32-bit per pixel, 8-bits per channel BGRA format (as we specified up in the readMovie method). This gives us DIRECT access to the pixel data.

One last note. The AVAssetReader doesn't loop when it hits the end. You need to detect that it finished (via the reader's status property) and just restart it again.

That's all there is to it...the first blog post at 7twenty7!