Making a News Podcast Generator

Ron Reiter
4 min readJul 27, 2019

--

APIs are ubiquitous. You can do just about everything you want with APIs today, which is pretty awesome, and solve any problem that used to require technology that only a few people had. I decided to solve an annoying problem I had every morning when I drive to work: I want someone to summarize the front page of Hacker News while I drive.

The thought of generating speech using a computer always was a terrible idea because it hurts my ears to listen to. However, as it turns out, Google’s newly released Wavenet-based Text-to-Speech technology is good enough to listen to for 15 minutes. And if that’s the case — then listening in to a summary of the top links can actually be practical and even enjoyable.

To do this, I wrote a Python script that does the following:

  1. Scrapes all of the daily Hacker News URLs using their open API.
  2. Summarizes them using an Article extraction API (in our case, I used Aylien, which I did not know about until I googled for an article extraction and summarization API)
  3. Uses Google’s Text-To-Speech engine on the title and summary
  4. Stitches all results into one mp3 file
  5. Uploads it to Google Cloud Storage
  6. Creates a Podcast RSS feed

So, let’s dig into how it works:

Getting the news

We start out by getting the data we want to listen to — a headline and a summary for each news item.

today = datetime.date.today().isoformat()
news_file = 'news_data/news_data_%s.json' % today

logging.info('getting news data...')
if not os.path.exists(news_file):
news_data = get_news_data(get_best_hn_urls(NUMBER_ARTICLES))
json.dump(news_data, open(news_file, "w"))
else:
news_data = json.load(open(news_file))

Getting the URLs we want to scrape is done using the Hacker News API, which does not require any authentication:

def get_best_hn_urls(num=10):
top_items = requests.get(BEST_STORIES_API).json()
links = []
for item in top_items[:num]:
item_data = requests.get(STORY_API % item).json()
if 'url' in item_data:
links.append(item_data['url'])

return links

--

--

Ron Reiter

An entrepreneur, and a web expert.