Saturday, 24 March 2018

Extracting timestamped video clips from Canon camcorder DVDs

We have a Canon DC50 camcorder that records onto mini DVDs.  We used it to record various family activities over a period of five or six years, and have a stack of recorded DVDs.

The content doesn't get viewed very often because it's not easy to see what there is without watching it all.  Most of the recordings consist of short clips.  It would be more accessible if the clips could be extracted from the DVDs with timestamps, and mixed in with all our photos (which are also timestamped).

So I set about working out how to extract the clips with timestamps.  The Camcorder supports two different disk formats for rewritable disks - VIDEO mode and VR mode, while recordable disks only support VIDEO mode.  It turns out that extracting timestamped video clips is easier when using VR mode, so I'll explain that one first.

VR mode

I stumbled across a tool on the web that extracts clips from DVD-VR disks: http://www.pixelbeat.org/programs/dvd-vr/

It's a command-line program that must first be compiled from the supplied source code.  Run the executable, passing the path to the .IFO and .VRO files on the DVD-VR disk, for example:


# ./dvd-vr/dvd-vr-0.9.7/dvd-vr M-leavers/VR_MANGR.IFO M-leavers/VR_MOVIE.VRO 
format: DVD-VR V1.0

tv_system   : PAL
resolution  : 720x576
aspect_ratio: 16:9
video_format: MPEG2
audio_channs: 2
audio_coding: Dolby AC-3

Number of programs: 27

num  : 1
date : 2012-08-08 12:11:05
size : 23511040       

num  : 2
date : 2012-08-08 12:13:30
size : 204558336      

num  : 3
date : 2012-08-08 12:16:36
size : 53760000       

num  : 4
date : 2012-08-08 12:17:31
size : 32878592       

num  : 5
date : 2012-08-09 10:13:58
size : 103884800      

num  : 6
date : 2012-08-09 10:15:30
size : 2369536        

num  : 7
date : 2012-08-11 10:04:54
size : 9584640        

num  : 8
date : 2012-08-11 10:12:03
size : 13516800       

num  : 9
date : 2012-08-11 10:12:57
size : 19445760       

num  : 10
date : 2012-08-11 10:25:03
size : 15503360       

num  : 11
date : 2012-08-11 10:29:44
size : 13492224       

num  : 12
date : 2012-08-11 10:51:46
size : 18182144       

num  : 13
date : 2012-08-11 10:52:11
size : 68724736       

num  : 14
date : 2012-08-11 11:00:29
size : 30570496       

num  : 15
date : 2012-08-11 11:00:59
size : 6033408        

num  : 16
date : 2012-08-11 11:01:18
size : 25321472       

num  : 17
date : 2012-08-11 14:14:48
size : 96055296       

num  : 18
date : 2012-08-15 09:18:56
size : 100737024      

num  : 19
date : 2012-08-15 09:20:37
size : 25937920       

num  : 20
date : 2012-08-15 09:21:05
size : 60123136       

num  : 21
date : 2012-08-15 09:24:37
size : 72810496       

num  : 22
date : 2012-08-23 09:29:27
size : 109633536      

num  : 23
date : 2012-08-25 11:09:01
size : 27938816       

num  : 24
date : 2012-08-25 11:09:43
size : 12443648       

num  : 25
date : 2012-08-25 11:11:41
size : 54155264       

num  : 26
date : 2012-08-25 11:13:04
size : 16809984       

num  : 27
date : 2012-08-29 14:21:29
size : 40706048       
This results in a set of .vob files being created, such as "20120808_121105.vob" which contain MPEG-2 video.  The filename is the datestamp of the start of the recording (i.e. in YYYYMMDD_HHMMSS format), and the file itself is datestamped the same.

VIDEO mode

This is more tricky.  The camcorder creates a set of DVD menus, spread across as many "pages" as required, with an entry for each clip recorded, and the datestamps are displayed in the menu highlights.  For example, here you can see the datestamp of the top-right clip in this menu screen is 12. APR. 2011, 11:07 AM:


I've found that the tcextract tool from the Ubuntu transcode package can extract the menu highlights from the DVD image as a set of bitmap images, which the subtitle2pgm tool from the subtitleripper package can convert to pgm format:


tcextract -i VIDEO_TS.VOB -x ps1 -t vob -a 0x20 | subtitle2pgm -o menu


The pamcut tool from the netpbm package can cut out specified regions of these images,

and after inverting each image with pnminvert (also from netpbm) the Optical Character Recognition (OCR) tool gocr will convert the datestamp text bitmaps into strings.

However, for some reason the tcextract command omits the first page of menu entries and starts with the second page, so the first six datestamps are missed.  This is very annoying.  Any ideas how I can resolve that?

Here's a script that extracts and prints the datestamps:


#!/bin/bash

# apt update && apt install transcode subtitleripper
tcextract -i VIDEO_TS.VOB -x ps1 -t vob -a 0x20 | subtitle2pgm -o menu

getdt () {
  pamcut  $1 $2 $3 $4 $5 | 
  pnminvert | 
  gocr - 2> /dev/null | 
  { 
    read dmy; 
    read hmt; 
    if [ "${dmy}$hmt" ]; then 
      y=$(echo $dmy | sed 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\3,')
      m=$(echo $dmy | sed -e 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\2,;s,JAN,01,;s,FEB,02,;s,MAR,03,;s,APR,04,;s,MAY,05,;s,JUN,06,;s,JUL,07,;s,AUG,08,;s,SEP,09,;s,OCT,10,;s,NOV,11,;s,DEC,12,')
      d=$(echo $dmy | sed 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\1,')
      chmt=$(echo $hmt | tr 'O' '0')
      date '+%Y%m%d_%H%M' -d "$y-$m-$d $chmt"
    else
      echo "Missing"
    fi; 
  }
}

i=1
for m in menu[0-9]*.pgm; do
  echo $m > /dev/stderr
  n=$(echo $m | sed 's,menu,,')
  getdt  58 172 150 34 $m
  getdt 242 172 150 34 $m
  getdt 426 172 150 34 $m
  getdt  58 346 150 34 $m
  getdt 242 346 150 34 $m
  getdt 426 346 150 34 $m
done | awk '
BEGIN {
  n=1
}
{
  if ($1 == last) {
    n=n+1
  } else {
    n=1
  }
  if (n>1) {
    printf "%s-%d\n", $1, n
  } else {
    print $1
  }
  last=$1
}
'
rm menu[0-9]*.pgm menu.srtx

I'll add the steps to extract the video clips and name them with these datestamps later.