Making a News Podcast Generator

4 min readJul 27, 2019

APIs are ubiquitous. You can do just about everything you want with APIs today, which is pretty awesome, and solve any problem that used to require technology that only a few people had. I decided to solve an annoying problem I had every morning when I drive to work: I want someone to summarize the front page of Hacker News while I drive.

The thought of generating speech using a computer always was a terrible idea because it hurts my ears to listen to. However, as it turns out, Google’s newly released Wavenet-based Text-to-Speech technology is good enough to listen to for 15 minutes. And if that’s the case — then listening in to a summary of the top links can actually be practical and even enjoyable.

To do this, I wrote a Python script that does the following:

Scrapes all of the daily Hacker News URLs using their open API.
Summarizes them using an Article extraction API (in our case, I used Aylien, which I did not know about until I googled for an article extraction and summarization API)
Uses Google’s Text-To-Speech engine on the title and summary
Stitches all results into one mp3 file
Uploads it to Google Cloud Storage
Creates a Podcast RSS feed

So, let’s dig into how it works:

Getting the news

We start out by getting the data we want to listen to — a headline and a summary for each news item.

today = datetime.date.today().isoformat()
news_file = 'news_data/news_data_%s.json' % today

logging.info('getting news data...')
if not os.path.exists(news_file):
    news_data = get_news_data(get_best_hn_urls(NUMBER_ARTICLES))
    json.dump(news_data, open(news_file, "w"))
else:
    news_data = json.load(open(news_file))

Getting the URLs we want to scrape is done using the Hacker News API, which does not require any authentication:

def get_best_hn_urls(num=10):
    top_items = requests.get(BEST_STORIES_API).json()
    links = []
    for item in top_items[:num]:
        item_data = requests.get(STORY_API % item).json()
        if 'url' in item_data:
            links.append(item_data['url'])

    return links

Unfortunately, Hacker News does not provide an official API for the best daily feed, so I had to scrape the HTML myself for that.

Summarizing the contents of each URL does require using a paid API. In this case, we’re using Aylien.

def get_news_data(urls):
    client = textapi.Client(AYLIEN_API_ID, AYLIEN_API_KEY)

    out = []

    for url in urls:
        general_data = client.Extract({'url': url})
        summary_data = client.Summarize({'url': url, 'sentences_number': TOTAL_SENTENCES})
        if not general_data['title']:
            continue

        if not summary_data['sentences']:
            continue

        out.append({
            "title": general_data['title'],
            "sentences": summary_data['sentences']
        })

    return out

So far, not something too hard.

Turning Text into Speech

As of today, Google has the best Text to Speech engine on the market. They are not the only ones who are using Deep Learning technologies for this, but they are by far the best ones.

Here is where the magic happens: for each item, we render an SSML structure (an XML structure which allows you to specify how you want the speech to be synthesized). For each SSML structure, we use our ssml_to_audio function and write the results into a file. The LINEAR16 format means we want the file to generate a lossless WAV file.

To avoid unnecessary audio synthesis, we will use a hash of the SSML as the cache key of the news item and save it locally in a temp folder after synthesizing.

Google has a free tier which includes 1 million characters per month for free, and then $16 per million characters with a WaveNet speech synthesizer.

item_template = t.load(jinja2.Environment(), "item.ssml")

items = []

for item in news_data:
    logging.info('processing %s' % item['title'])
    item_ssml = item_template.render(item)
    item_hash = hashlib.md5(item_ssml.encode('utf8')).hexdigest()
    blob_name = 'temp/%s.wav' % item_hash

    if not os.path.exists(blob_name):
        raw_data = ssml_to_audio(item_ssml, format='LINEAR16')
        with open(blob_name, 'wb') as f:
            f.write(raw_data)

    logging.info('saved %s' % blob_name)
    items.append(blob_name)

The function calls Google using the standard REST API. What I should have used is Google’s standard API client, but I got a bit lazy to do it. This function simply runs the SSML into a single API call which returns the audio within the response.

def ssml_to_audio(ssml, format='OGG_OPUS', voice_type=VOICE_TYPE, voice_gender=VOICE_GENDER, voice_lang=VOICE_LANG):
    json_output = requests.post(TTS_URL, json.dumps({
        'input': {
            'ssml': ssml
        },
        'voice': {
            'languageCode': voice_lang,
            'name': voice_type,
            'ssmlGender': voice_gender
        },
        'audioConfig': {
            'audioEncoding': format
        }
    }), headers={
        "X-Goog-Api-Key": GOOGLE_API_KEY,
        "Content-Type": "application/json"
    }).json()

    if 'audioContent' not in json_output:
        raise Exception(json_output['error']['message'])

    raw_data = codecs.decode(json_output['audioContent'].encode("utf-8"), "base64")

    return raw_data

Generating the stitched audio

After generating all of the pieces, we will need to stitch it locally. We could have done it within the SSML, but it’s slower, less sophisticated, and has limitations on the amount of data.

We want to do a jingle at the beginning and the end, and then an intermediate jingle on each new item, so we will know when to listen into the title.

final = pydub.AudioSegment.empty()

final += pydub.AudioSegment.from_mp3('resources/main.mp3')
for item in items:
    final += pydub.AudioSegment.from_mp3('resources/interim.mp3')
    final += pydub.AudioSegment.from_wav(item)

final += pydub.AudioSegment.from_mp3('resources/main.mp3')

fn = "bestofhn_%s.mp3" % today
logging.info('saving %s' % fn)
final.export(os.path.join('podcasts', fn), format="mp3")

Another method of generating the final stitched audio is using Google’s Text-To-Speech API again and write SSML which includes <audio> tags that refer to the generated parts on Google cloud storage buckets (since Google’s API does not allow you to synthesize over 5,000 characters per request), but I decided it would be easier and more efficient to just stitch everything up locally.

And last but not least, we can save the file to Google Storage, and regenerate the RSS file. The RSS generator needs to fetch the size and duration of the file. Size is easy to get, but the duration in seconds also requires using PyDub to parse the MP3 file.

items = []
for f in reversed(sorted(glob.glob("podcasts/bestofhn_*.mp3"))):
    podcast_date = datetime.datetime.strptime(os.path.basename(f)[9:19], "%Y-%m-%d")
    items.append({
        'filename': os.path.basename(f),
        'size': os.stat(f).st_size,
        'duration': int(pydub.AudioSegment.from_mp3(f).duration_seconds),
        'date': podcast_date.strftime("%c"),
        'nice_date': podcast_date.strftime("%B %-d, %Y"),
    })

item_template = t.load(jinja2.Environment(), "podcast.rss")

output = item_template.render(items=items)

The Result

Here is the output for July 19th, 2019: https://storage.googleapis.com/bestofhn/podcasts/bestofhn_2019-07-19.mp3

And the RSS feed: https://storage.googleapis.com/bestofhn/bestofhn.rss

The Code

Feel free to contribute: https://github.com/ronreiter/news-podcast-generator