How to Create a podcast from a website

2005-12-07 16:15:00

This document explains the step used to create a podcast of google news. A bunch of standard unix tools are used to get the job done. You can get a copy of the scripts here.

All of the work is done on my local machine then uploaded. The host, where my website sits, does not let me use my own executables. I will be skipping the process of uploading the files since that is unique to my webserver.

The first shell script that we call is It uses links to get a text dump of the website. This keeps us from having to parse html. Here is the command that we run:

links -no-numbering -dump > /tmp/news

Next we generate a wave file using festival's text to speech interface

talk="text2wave -o $news_wave"

end='grep -ni end=`grep -ni "international versions of google news" $news_file | awk '{print $1 }' | sed s/://`

cat $news_file | sed $end,'$d' |sed '1,12d' | more | egrep -v ">> *$" | egrep -v "(Jump to:|Sports |)" | egrep -v "^.+-.+-.+$" | sed "s/*//g" | $talk

The most important step is cleaning up the text before sending it to the tts. Cleaning the text is required to keep festival from reading >>> as "greater than, greater than, greater than". Sed, awk, and grep are used to clean up the output from Links. Then we pipe the cleaned text to the tts. Festival is a huge program and I don't have the time to wrap my head around all of it. That is why I'm just using the default voice. There are probably tts programs that will give you a more realistic voice but I like that retro sound.

The next thing that we want to do is turn the wave file into an mp3. We call to do that.

# $1 = Title one
# $2 = Title two
# $3 = wave file
# $4 = mp3 file

yr=`date +%Y`

/usr/local/bin/lame -h -b 16 --tt "$1" --ta "$2" --ty "$yr" --tc "Provided by" $3 $4

Wav2mp3 is a simple script that calls lame to do the conversion for us.

So now we need an rss file for the mp3 we have just created. A call to will do that for us. Here is how it is called.

file_length=`ls -l $news_mp3 | awk '{print $5}'`

$news_folder/ $rss_file  "A speech synthesized version of Google News"  "Audio Google News" $upload_mp3 $file_length

The last step is to upload the rss and the mp3 files.