The Video­Mat­ting pro­ject is the first pub­lic ob­jec­tive bench­mark for video-mat­ting meth­ods. It con­tains scat­ter plots and rat­ing ta­bles for dif­fer­ent qual­ity met­rics. In ad­di­tion, re­sults for par­tic­i­pat­ing meth­ods are avail­able for view­ing on a player equipped with a mov­able zoom re­gion. We be­lieve our work will help rank ex­ist­ing meth­ods and aid de­vel­op­ers of new meth­ods in im­prov­ing their re­sults.


The data set con­sists of five mov­ing ob­jects cap­tured in front of a green plate and seven cap­tured us­ing the stop-mo­tion pro­ce­dure de­scribed be­low. We com­posed the ob­jects over a set of back­ground videos with var­i­ous lev­els of 3D cam­era mo­tion, color bal­ance, and noise. We pub­lished ground-truth data for two stop-mo­tion se­quences and hid the rest to en­sure fair­ness of the com­par­i­son.

Us­ing thresh­old­ing and mor­pho­log­i­cal op­er­a­tions on ground-truth al­pha mattes, we gen­er­ated nar­row trimaps. Then, we di­lated the re­sults us­ing graph­cut-based en­ergy min­i­miza­tion which pro­vides us with more hand­made-look­ing trimaps than com­mon mor­pho­log­i­cal di­la­tion.

Chroma Keying

green screen
stop motion
Alpha mattes from chroma keying and stop-motion capture for the same image region. The stop-motion result is significantly better at preserving details.

Chroma key­ing is a com­mon prac­tice of the cin­ema in­dus­try: the cin­e­matog­ra­pher cap­tures an ac­tor in front of a green or blue screen, then the VFX ex­pert re­places the back­ground us­ing spe­cial soft­ware. Our eval­u­a­tion uses five green-screen video se­quences with a sig­nif­i­cant amount of semi­trans­parency (e.g., hair or mo­tion blur), pro­vided to us by Hol­ly­wood cam­era work. We ex­tract al­pha mattes and cor­re­spond­ing fore­grounds us­ing The Foundry Key­light. Chroma key­ing en­ables us to get al­pha mattes of nat­ural-look­ing ob­jects with ar­bi­trary mo­tion. Nev­er­the­less, this tech­nique can’t guar­an­tee that the al­pha maps are nat­ural, be­cause it as­sumes the screen color is ab­sent from the fore­ground ob­ject. To get al­pha maps that have a very nat­ural ap­pear­ance, we use the stop-mo­tion method.

Stop Motion

One-step capture over different backgrounds. We use checkerboard backgrounds instead of solid ones to eliminate screen reflection.

We de­signed the fol­low­ing pro­ce­dure to per­form stop-mo­tion cap­ture: A fuzzy toy is placed on the plat­form in front of an LCD mon­i­tor. The toy ro­tates in small, dis­crete steps along a pre­de­fined 3D tra­jec­tory, con­trolled by two ser­vos con­nected to a com­puter. Af­ter each step the dig­i­tal cam­era in front of the setup cap­tures the mo­tion­less toy against a set of back­ground im­ages. At the end of this process, the toy is re­moved and the cam­era again cap­tures all of the back­ground im­ages.

We paid spe­cial at­ten­tion to avoid­ing re­flec­tions of the back­ground screen in the fore­ground ob­ject. These re­flec­tions can lead to false trans­parency that is es­pe­cially no­tice­able in non­trans­par­ent re­gions. To re­duce the amount of re­flec­tion we used checker­board back­ground im­ages in­stead of solid col­ors, thereby ad­just­ing the mean color of the screen to be the same for each back­ground.

At the end we cor­rected global light­ing changes caused by light bulb flick­er­ing. Thus fi­nally we ob­tain al­pha mattes with less than 1% of noise level. The de­tailed de­scrip­tion of ground-truth ex­trac­tion meth­ods is given in [3].

Evaluation Methodology

Our com­par­i­son in­cludes both im­age- and video-mat­ting meth­ods. We ap­ply each mat­ting method to the videos in our data set, and then com­pare the re­sults us­ing the fol­low­ing met­rics of per-pixel ac­cu­racy and tem­po­ral co­herency (look into our pa­per [3] for com­par­i­son of dif­fer­ent met­rics):

Equationmultiline equation
Equationmultiline equation
Equationmultiline equation

Here Equationnumber-sign de­notes to­tal num­ber of pix­els, Equationalpha Subscript p comma t and EquationModifyingAbove alpha With caret Subscript p comma t de­note trans­parency val­ues of video mat­ting un­der con­sid­er­a­tion and ground truth cor­re­spond­ingly at pixel Equationp of frame Equationt, and Equationv Subscript p de­notes mo­tion vec­tor at pixel Equationp. We use op­ti­cal-flow al­go­rithm [11] com­puted for ground-truth se­quences. It is worth not­ing that mo­tion-aware met­rics will not give un­fair ad­van­tage to mat­ting meth­ods based on the sim­i­lar mo­tion es­ti­ma­tion method since they do not have ground truth se­quence. The de­tailed de­scrip­tion of used qual­ity met­rics is given in [3].

Public Sequences

For the train­ing pur­poses we pub­lish here three test se­quences with their ground-truth trans­parency maps. De­vel­op­ers and re­searchers are wel­come to use these se­quences, but we ask to cite us [3]


We in­vite de­vel­op­ers of video-mat­ting meth­ods to use our bench­mark. We will eval­u­ate the sub­mit­ted data and re­port scores to the de­vel­oper. In cases where the de­vel­oper specif­i­cally grants per­mis­sion, we will pub­lish the re­sults on our site. We can also pub­lish anony­mous scores for blind-re­viewed pa­pers. To par­tic­i­pate, sim­ply fol­low these steps:

  1. Download the data set containing our sequences: City, Flowers, Concert, Rain, Snow, Vitaliy, Artem, Slava, Juneau, Woods,
  2. Apply your method to each of our test cases
  3. Upload the alpha and foreground sequences to any file-sharing service. We kindly ask you to maintain these naming and directory-structure conventions. If your method doesn't explicitly produce the foreground images you can skip uploading them; in this case, we will generate them using method proposed in [7].
  4. Fill in this form to provide information about your method

Con­tact us by email with any ques­tions or sug­ges­tions at ques­tions@video­mat­ting.com.

Cite Us

To re­fer to our eval­u­a­tion or test se­quences in your work cite our pa­per [3].

	title={Perceptually Motivated Benchmark for Video Matting},
	author={Mikhail Erofeev and Yury Gitman and  Dmitriy Vatolin and Alexey Fedorov and Jue Wang},
	booktitle={Proceedings of the British Machine Vision Conference (BMVC)},
	publisher={BMVA Press},



  1. city
  2. rain
  3. concert
  4. flowers
  5. snow
  6. Slava
  7. Vitaliy
  8. Artem
  9. juneau
  10. woods
  1. Source
  2. Trimap
  3. BM [2]
  4. RM [12]
  5. RE [15]
  6. CF [7]
  7. LB [13]
  8. NlM [5]
  9. ShM [4]
  10. CS [10]
  11. KNN [1]
  12. SpM [6]
  13. SpSM [14]
  14. DM [16]
