Sunday, 10 June 2018

Moved from github to gitlab

I've moved my camcorder-dvd-extractor project from github to gitlab:

Sunday, 3 June 2018

Update on extracting clips from Canon DC50 camcorder DVDs

In a previous post I described the problem I was trying to solve. 

Long story short, I've solved the issue of the missing first menu page of clips by altering the subtitleripper tool, and wrapped the solution up into a docker container (because containers are the dog's pyjamas). 

It will be interesting to see if anyone finds it and finds it useful.

I emailed the author of the subtitleripper package about the issue I had, but have not received any response.

Saturday, 24 March 2018

Extracting timestamped video clips from Canon camcorder DVDs

We have a Canon DC50 camcorder that records onto mini DVDs.  We used it to record various family activities over a period of five or six years, and have a stack of recorded DVDs.

The content doesn't get viewed very often because it's not easy to see what there is without watching it all.  Most of the recordings consist of short clips.  It would be more accessible if the clips could be extracted from the DVDs with timestamps, and mixed in with all our photos (which are also timestamped).

So I set about working out how to extract the clips with timestamps.  The Camcorder supports two different disk formats for rewritable disks - VIDEO mode and VR mode, while recordable disks only support VIDEO mode.  It turns out that extracting timestamped video clips is easier when using VR mode, so I'll explain that one first.

VR mode

I stumbled across a tool on the web that extracts clips from DVD-VR disks:

It's a command-line program that must first be compiled from the supplied source code.  Run the executable, passing the path to the .IFO and .VRO files on the DVD-VR disk, for example:

# ./dvd-vr/dvd-vr-0.9.7/dvd-vr M-leavers/VR_MANGR.IFO M-leavers/VR_MOVIE.VRO 
format: DVD-VR V1.0

tv_system   : PAL
resolution  : 720x576
aspect_ratio: 16:9
video_format: MPEG2
audio_channs: 2
audio_coding: Dolby AC-3

Number of programs: 27

num  : 1
date : 2012-08-08 12:11:05
size : 23511040       

num  : 2
date : 2012-08-08 12:13:30
size : 204558336      

num  : 3
date : 2012-08-08 12:16:36
size : 53760000       

num  : 4
date : 2012-08-08 12:17:31
size : 32878592       

num  : 5
date : 2012-08-09 10:13:58
size : 103884800      

num  : 6
date : 2012-08-09 10:15:30
size : 2369536        

num  : 7
date : 2012-08-11 10:04:54
size : 9584640        

num  : 8
date : 2012-08-11 10:12:03
size : 13516800       

num  : 9
date : 2012-08-11 10:12:57
size : 19445760       

num  : 10
date : 2012-08-11 10:25:03
size : 15503360       

num  : 11
date : 2012-08-11 10:29:44
size : 13492224       

num  : 12
date : 2012-08-11 10:51:46
size : 18182144       

num  : 13
date : 2012-08-11 10:52:11
size : 68724736       

num  : 14
date : 2012-08-11 11:00:29
size : 30570496       

num  : 15
date : 2012-08-11 11:00:59
size : 6033408        

num  : 16
date : 2012-08-11 11:01:18
size : 25321472       

num  : 17
date : 2012-08-11 14:14:48
size : 96055296       

num  : 18
date : 2012-08-15 09:18:56
size : 100737024      

num  : 19
date : 2012-08-15 09:20:37
size : 25937920       

num  : 20
date : 2012-08-15 09:21:05
size : 60123136       

num  : 21
date : 2012-08-15 09:24:37
size : 72810496       

num  : 22
date : 2012-08-23 09:29:27
size : 109633536      

num  : 23
date : 2012-08-25 11:09:01
size : 27938816       

num  : 24
date : 2012-08-25 11:09:43
size : 12443648       

num  : 25
date : 2012-08-25 11:11:41
size : 54155264       

num  : 26
date : 2012-08-25 11:13:04
size : 16809984       

num  : 27
date : 2012-08-29 14:21:29
size : 40706048       
This results in a set of .vob files being created, such as "20120808_121105.vob" which contain MPEG-2 video.  The filename is the datestamp of the start of the recording (i.e. in YYYYMMDD_HHMMSS format), and the file itself is datestamped the same.

VIDEO mode

This is more tricky.  The camcorder creates a set of DVD menus, spread across as many "pages" as required, with an entry for each clip recorded, and the datestamps are displayed in the menu highlights.  For example, here you can see the datestamp of the top-right clip in this menu screen is 12. APR. 2011, 11:07 AM:

I've found that the tcextract tool from the Ubuntu transcode package can extract the menu highlights from the DVD image as a set of bitmap images, which the subtitle2pgm tool from the subtitleripper package can convert to pgm format:

tcextract -i VIDEO_TS.VOB -x ps1 -t vob -a 0x20 | subtitle2pgm -o menu

The pamcut tool from the netpbm package can cut out specified regions of these images,

and after inverting each image with pnminvert (also from netpbm) the Optical Character Recognition (OCR) tool gocr will convert the datestamp text bitmaps into strings.

However, for some reason the tcextract command omits the first page of menu entries and starts with the second page, so the first six datestamps are missed.  This is very annoying.  Any ideas how I can resolve that?

Here's a script that extracts and prints the datestamps:


# apt update && apt install transcode subtitleripper
tcextract -i VIDEO_TS.VOB -x ps1 -t vob -a 0x20 | subtitle2pgm -o menu

getdt () {
  pamcut  $1 $2 $3 $4 $5 | 
  pnminvert | 
  gocr - 2> /dev/null | 
    read dmy; 
    read hmt; 
    if [ "${dmy}$hmt" ]; then 
      y=$(echo $dmy | sed 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\3,')
      m=$(echo $dmy | sed -e 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\2,;s,JAN,01,;s,FEB,02,;s,MAR,03,;s,APR,04,;s,MAY,05,;s,JUN,06,;s,JUL,07,;s,AUG,08,;s,SEP,09,;s,OCT,10,;s,NOV,11,;s,DEC,12,')
      d=$(echo $dmy | sed 's,\([0-9]*\)\.\([A-Z]*\)\.\([0-9]*\),\1,')
      chmt=$(echo $hmt | tr 'O' '0')
      date '+%Y%m%d_%H%M' -d "$y-$m-$d $chmt"
      echo "Missing"

for m in menu[0-9]*.pgm; do
  echo $m > /dev/stderr
  n=$(echo $m | sed 's,menu,,')
  getdt  58 172 150 34 $m
  getdt 242 172 150 34 $m
  getdt 426 172 150 34 $m
  getdt  58 346 150 34 $m
  getdt 242 346 150 34 $m
  getdt 426 346 150 34 $m
done | awk '
  if ($1 == last) {
  } else {
  if (n>1) {
    printf "%s-%d\n", $1, n
  } else {
    print $1
rm menu[0-9]*.pgm menu.srtx

I'll add the steps to extract the video clips and name them with these datestamps later.

Thursday, 23 July 2015

Bananas: 10p each or two for 25p

All in One 15
All in One 25

Given that the standard price for calls is 3p/min, what's the benefit of the "All in One 25" add-on if the "All in One 15" (£15) + 200 minutes of calls (200*3p=£6) would cost £21?

I tried asking Three, but they just replied, "Hi Rob, you've not missed anything, that's the correct price of our Pay As You Go Add-ons."


So, it seems that Three's response to my question was to cancel the "All in One 15" add-on and replace it with an "All in One 20" add-on, which has the exact same minutes, text and data, but costs £5 more.  #facepalm

Saturday, 20 June 2015

TSL2561 light sensor ranges

I've been playing with ESP8266 microcontrollers with built-in wifi.  For want of a better excuse, I've used one to create a wifi connected temperature, humidity and light sensor for our greenhouse:

The recycled coffee jar contains a 3.3V regulator, the ESP8266 on an ESP-12E board (which contains flash memory and a PCB antenna, among other things), a TSL2561 light sensor, and on top of the coffee jar lid is an AM2302 temperature and humidity sensor.

The ESP8266 has been flashed with Nodemcu ( which means it can be programmed with Lua scripts.  Lua, it turns out, is horrible, but that's another story.

Every five minutes, the ESP wakes up, collects temperature, humidity and light level values from the sensors, connects to our guest wifi network, and uploads the values to a ThingSpeak channel (  You can see some charts and dials that are linked to the recorded values here:

The TSL2561 contains two sensors - one that measures visible plus infrared light, and another that measures infrared light only.  The two levels are supposed to be combined using a calculation that approximates brightness seen by the human eye, but I don't care about that for the greenhouse, I just want to see the two values separately.

The TSL2561 measure the light levels and provides output as integer values over a serial interface.  It has two parameters that affect the range - a 16x gain that can be switched on or off, and a variable integration time, with three preset durations of 402ms, 101ms or 13.7ms.  (Integration time can also be controlled manually.)

Initially I used the default integration time (402ms) with the 16x gain turned on, but it quickly became apparent that the values were hitting the maximum end of the range:

So I turned off the 16x gain, but it's still hitting the ceiling sometimes:

The good news is that it's bright in the greenhouse (good for the plants), but I need to use a shorter integration time to extend the range of the sensor further.

Sampling by integration

Sampling by integration is a bit like collecting rain in a glass and measuring the level.  If you put the glass out in the rain for one hour and measure the water level, then empty the glass and put it back in the rain for 10 hours, if it's raining at the same rate throughout that time, the water level will be 10 times higher.

If the glass has straight parallel sides, you can work out the rainfall in mm per hour by dividing the water level measured by the number of hours the glass was outside, and that will tell you the average rate of rainfall during that time.

It's hard to measure a mm level of water precisely, but errors in the measurement will be less significant if the glass is fuller.  In other words, if it's not raining very much, then leave the glass outside for a longer period of time to get a more accurate measurement of the average rainfall rate.

If you leave the glass outside for 10 hours and it rains a lot, the glass will overflow.  In this case, the sampling period needs to be reduced, so that the measured value comes back within the available range.

Enough talk about rain, back to the TSL2561

The integration time of the TSL2561 can be adjusted, much like adjusting the time that the glass is out in the rain, and like the glass, the TSL2561 has a maximum value it can measure, which is 65535.

There is an additional complication - if the integration time is reduced, the maximum value that can be measured also reduces.  The analogy with the glass breaks down a bit here - it's as if the glass is being constructed at the same time as the water is being collected, so if you only measure rain for one hour then you get a shorter glass to measure it in.  If you measure for ten hours, you get a full-size glass.  In fact, the glass is finished to its full height even before the ten hours is up.

With the TSL2561, if you select an integration time of 101ms, the maximum value that can be measured is 37177, and if you select 13.7ms, then it is 5047.

So which integration time option should be chosen, and should the 16x gain be switched on or off?

Switching off the gain and selecting the shortest integration time should provide the largest range of measurement values.  However, because maximum value is limited by the shorter integration time, the range at 13.7ms is about the same as at 101ms.

It's hard to visualise so I made a chart:

(which gets cropped in this layout - click it to see the full image).

Input light level is on the X-axis, with a range from 0 to 1 (where 1 is the brightest that the TSL2561 can measure without overflow).  The output values are on the Y-axis, and you can see that the 101ms and 13.7ms integration time options have a lower range of possible values than the 402ms option.

It's clear to see in the chart that the 13.7ms options provide about the same input range as the 101ms corresponding options, but with a lower range of output values.  The only benefit of choosing 13.7ms is that you get the measurements quicker.  This isn't useful for my greenhouse, so I won't use the 13.7ms setting.

The 101ms option with the 16x gain turned off provides the greatest input range.  If I had to use just one combination, this would be it.  However, at lower light levels, the granularity of output values is poor.

Therefore, at lower light levels, I can switch to one of the other options to get better granularity of output values.  

The output values will need to be scaled according to the chosen settings, so that the values don't alter significantly when changing the integration time or switching on/off the gain.

So, given that it doesn't matter too much how long the measurement takes, I think what my greenhouse monitor needs to do is try the remaining four options (ignoring the two 13.7ms ones) in order of smallest range, smallest granularity first, and return the first result that doesn't overflow, scaled to normalise the output values across the set of possible settings.

Next steps 

Turn the above into code.


Hmm.  For reasons I cannot fathom, when I set the integration time to 101ms and the light is bright, I cannot get a "full scale" reading of 37177 on both channels unless I wait about 250ms after powering up.  I can't see any clues in the datasheet as to why that might be - it's supposed to begin integrating as soon as it powers up, then transfer the values to the data registers.

For now I've set it to wait 500ms in all cases before reading the values, to be sure.

Sunday, 23 February 2014

Displayless interfaces

Thanks to the tube strike the other week, I was forced to get some exercise. I walked the few miles from Liverpool St to the office where I work, and because I wasn’t familiar with the route and didn’t want to walk head-first into a lamp-post while looking at the map on my phone screen, I put an earphone in one ear and enabled voice-guidance on Google Navigation, leaving the phone in my pocket, so that I could concentrate on avoiding lamp-posts and not getting run over.

Disappointingly, the voice directions proved to be less than adequate. For some reason, the spoken instructions are simpler when Walk is selected instead of Drive. At all but the simplest of junctions, the terse “turn left” or “continue straight” message in my ear was ambiguous, and if I turned around to consider the options then any sense of direction was quickly lost.

What I needed was more context. “Turn left into Aldgate St,” instead of just, “Turn left”, would have been a good start. Better still, if I could ask questions such as, “Which road should I take?”, “Describe this junction,” or the kids’ favourite - “Are we nearly there yet?”

Google Glass could have put the map into my view, but who wants to look like part of the Borg Collective? And for that matter, I don’t much want to be seen talking to myself either, so give me an earphone for one ear and some buttons I can press without looking at the them, and I’ll be happy.

I think Google may have missed a trick - Google Glass is expensive, but Google Ear could be a free downloadable app.

There’s a similar problem when using SatNav in my car. If I want to detour to get fuel, then using the touchscreen to zoom out and look around the local area is tricky, not mention dangerous and illegal. There’s trend among car manufacturers now to replace dashboard controls with a large touchscreen, expanding this problem to even more tasks, from turning up the fan to changing radio station. (At least someone is thinking about this, but the communication is not rich enough in that interface for what I want.)

If I was using Google Ear to navigate while driving, and the set of standard buttons I can operate without looking at are mounted on the steering column, I could reroute without even looking at the screen, let alone touching it.

The key requirements are these:
  1. A set of buttons that can be operated without looking at them. Not too few, and not too many, and I’ll need a set I can use in my car, and a set I can use while walking, and perhaps a set at my desk.
  2. The set of buttons needs to be standardised, so that multiple manufacturers can supply them in various forms - a set on a bluetooth-connected key-fob, a set for the steering column, a set on the side of a bluetooth earpiece (killing two birds with one stone), ...
  3. An intuitive set of conventions for the use of said buttons, which remain the same in all contexts. Think of the buttons on a games console controller - the left and right buttons always mean the same thing, so their use quickly becomes second nature.
  4. Audible communication from the device. It wouldn’t all have to be spoken - short sounds can indicate status, progress, etc
  5. Some conventions for the structure of the voice output, which fit in with the button input conventions, and aim for efficient interaction between device and user.  For example, if I search for something, the spoken response could start with a simple, "28 results," and then I can decide if I want it to start listing the results or if I will refine the search first.
  6. Voice input commands for situations where button input would be too complex - for example, trigger voice mode with the buttons and say, “Find nearby petrol stations.”
  7. It lives in my phone, so I have it with me at all times.

What do you think? Would you use it?

Monday, 27 August 2012

Fixing a John Lewis clock that goes black in a dark room (and changing its colour too)

Despite their faults, we like the style of these John Lewis Indigo clocks.  We have two, and one of them recently started to flicker at low light levels, and then went completely dark whenever the room was dark.  They're supposed to dim in a dark room, but not go out completely.  I took it apart to look for the source of the problem.

WARNING: if you're going to do similar, be aware that there are mains level voltages inside this clock.  Disconnect from the mains and pull out the plug before you open the clock, and don't plug it in again until it's reassembled.  And don't come running to me if you kill yourself.

Inside the clock there are four blue 3mm LEDs that poke into holes in a plastic light-spreader behind the LCD display.  A photodiode, a couple of transistors and some resistors control the LEDs' brightness, however one of the LEDs wasn't lighting up at all.  (Ok, I admit it, I turned on the power while I had the thing in pieces in order to discover this.)  I can't say I'd noticed that the backlight was uneven - the light-spreader must do a good job.

I didn't have any spare 3mm blue LEDs in stock, but I did have white ones, so I desoldered and removed all four, and replaced them with white ones, taking care to insert them with the correct polarity.  I also had some red LEDs in stock, but I chose the white because white LEDs have very similar properties to blue ones (in particular, the forward voltage), whereas red ones are quite different (a much lower forward voltage).

For some reason, the faulty LED was also causing the other three to extinguish at low light levels - with all four replaced it is working perfectly again now.

The white backlight looks really good, and matches the room better too:

The quality of the soldering inside the clock is very poor.  I should have taken a photo, but forgot.  You'll have to take my word for it.  Poor, very poor.