TT subripper
by Filiep Geeraert
Frequently Asked Questions
Are all the colours
preserved ?
Just before updating these pages, I discovered
that sometimes also background colours are being used.
On a RAI subtitled version of "Derrick" they put the word "important" in blue
letters on a yellow background.
I do not think it would be very hard to decode the background colours as well,
however SRT seems to have no tag for that, so there is little point in doing so.
For the rest, all colours seem to be decoded correctly.
Why are accented characters
completely translated wrongly ?
With the latest version, I now
support English, French, German, Spanish/Portugese codepages.
However, the program does not know which codepage is required, so you will need
to provide the correct parameter to get it decoded correctly.
Use -l=xx where xx=en/fr/de/es/pt/it
(English/French/German/Spanish/Portugese and Italian).
For Flemish/Dutch sites I have not found a method of detecting accented
characters correctly, as they are using a completely different method of
implementing this feature.
Luckily, we can perfectly understand the subtitles, even without the accents.
Why are my subtitles messed up ?
I can think of various reasons
for that.
If you find that your accented characters are translated incorrectly, choose the
correct codepage.
Another reason might be that you are watching a program with live subtitles,
such as a news show.
Screen positions are not supported by subtitles, and you might find that
subtitles are displayed too long or not at all.
The best way to improve this is to experiment with the -min=xxx option.
This option allows you to specify a minimum duration for the sum of all
subtitles that appear within the same second.
So if you have 5 subtitles that appear within the 8th second, and you specified
the -min=500 parameter, those 500 milliseconds would be spread over 5 subtitles,
with each subtitle appearing for 100 milliseconds on screen.
Try with very short values (10 or 5 milliseconds) to improve the display.
Even though I have an SRT file
that is several kilobytes in size, no subtitles are being shown on screen.
If the TV station disabled the
TT clock during the broadcast, the subtitles will be there in the SRT file, but
all times will be set to 00:00:00,000.
This can only be fixed if the whole design of the program is changed, and no
longer the TT clock is being used to synchronise subtitles.
It could also be caused by time codes that are put on other pages, which would
confuse the program, thinking that those other timecodes are the real time being
shown, instead of a time code of a stock market event, for instance.
I get an empty subtitle
file. What 's up ?
First check that you connected
the right pins in Graphedit.
If you connect the sound or video pin instead of the Teletext pin (normally pin 2),
there simply will not be any TT data in the stream for TTsubripper to decode.
Also, make sure that the DVR-MS file actually still contains subtitles.
There are several cutter utilities (Videoredo for instance) that simply discard
the Teletext stream.
If you verified this and things are still not working, the reason probably
is that the code that announces subtitles is probably a bit different for this
station.
As I said before, I do not fully understand the Teletext specifications, as this
is a hugely complex system.
I am doing my best at finding a common way of discovering and extracting the
subtitles.
Maybe in future versions, I will allow for a parameter to set the code to look
for.
You could help me by creating a very short dump file for that station and
sending it to me, so that I can find this code the station is using, and add it
to the program.
Every subtitle is saved twice in
the SRT file, why is this ?
Some TV stations effectively broadcast each
subtitle twice, sometimes on the same page (VTM, page 888 for instance), in
other cases on 2 separate pages (La Une : 777 and 888).
If the subtitles are broadcast twice on the same page, try using the -half
option, which would only record every other subtitle.
If they are broadcast on different pages, find out by looking at the normal
teletext pages, which ones are involved, and use the -i=xxx to ignore a certain
page.
What is the purpose of those 4 output
files being created ?
The obvious one is SUBTITLES.SRT.
This is meant to be played together with the video in Mediaplayer Classic or
using Directvobsub in any player.
You will have to convert your video to some other format than DVR-MS though, as
this format does not seem fit to be combined with an SRT file.
The second one is Subtitles without colours.SRT
This one is meant for people who want to burn a DVD with the subtitles in it.
As DVD's don't really support multi-colour subtitles you need a version in which
the colours have been removed again.
This is it.
The third one is SUBTITLES.HTM.
When you burn a DVD with the subtitles in them you loose colour information, and
this is being used extensively by Teletext subtitlers to distinguish between
various speakers and to indicate background sounds, etc.
By opening the file in a web browser, you can have a look at the original
colours, so that you can see what they were, and adapt the white-only subtitles
to make it more clear who is saying what.
The last one is called DUMP.DEC.
This is the decoded dump file, if you want you can have a look at it.
It contains all the teletext data, not just the subtitles.
It can help you to understand why subtitles are being decoded incorrectly.
What features are planned for future
versions ?
Some things I have in mind are :
* getting the manual part (the Graphedit stuff) into the program itself
* accepting more parameters, so that you can use the filenames you like.
* limiting the CPU usage from within the program itself.
I might completely rewrite the synchronisation mechanism, but I am not sure I
can do it.
I have another question or
remark, how can I contact you ?
Look at the picture, it contains
the email address where you can contact me.
I put it in a picture to avoid those spambots that scan the Internet, harvesting
email addresses.
