Automated translation of text and data in Python with deep_translator

Dima Diachkov
Dev Genius
Published in
5 min readApr 1, 2024

--

Today, we often find ourselves needing to translate text from one language to another. This need can pop up in various situations, like when we’re trying to understand instructions, read a story, or even share information with friends who speak different languages. While there are many ways to get translations, doing it manually every time can be a bit of a hassle, especially if you have to do it a lot.

In the real-world applications of data science, the ability to automatically translate text is incredibly valuable. Data scientists frequently work with datasets from all over the world, which can include user feedback, product reviews, or social media posts in multiple languages. Analyzing this text to gather insights or understand trends requires it to be in a language that the data science tools can process, usually English.

Luckily, for those of us who like to find smarter ways to handle tasks, there’s a neat solution. By using a bit of Python code, we can make our computers do the translating for us. This not only saves time but also lets us focus on more important stuff. Python, a programming language known for being easy to learn and powerful in action, has a library called deep_translator that’s perfect for this job. It can translate text into many different languages quickly and without much fuss.

In this article, I am going to show you how to use the deep_translator library to automatically translate text. This skill is not only handy for everyday needs but also opens up a world of possibilities in data science, allowing us to work with diverse datasets and widen our analysis scope. Whether you’re new to programming, diving into data science, or have been coding for a while, you’ll see how simple it is to get your translations done with just a few lines of code.

Case 1 — Simple translation to Multiple Languages

Imagine you’re sharing some information with an internet audience via email database. To make your invitation more personal, you decide to translate a welcoming phrase into several languages. Using deep_translator in Python, you can seamlessly perform these translations.

Setting Up in Python

Before we start translating, we need to make sure that the deep_translator library is available in your Jupyter environment. In the first cell of your notebook, type and execute:

!pip install deep_translator
from deep_translator import GoogleTranslator

This line of code ensures that deep_translator is installed in your current environment, allowing you to use it for translations directly within your notebook. Then we load the google translator module.

Writing the Translation Code

With deep_translator ready, it's time to translate our phrase. The deep_translator library will connect to the internet to perform the translations and then print out the translated phrases.

For instance, in a new cell, input the following Python code to translate your text to German:

# The phrase we want to translate
phrase_to_translate = "Please subscribe to my channel :-)"
# Translating to German
translated_to_german = GoogleTranslator(source='auto', target='de').translate(phrase_to_translate)
print("In German: ", translated_to_german)
Output for the code

Now let’s simply adjust the translation target to Italian.

# Translating to Italian
translated_to_italian = GoogleTranslator(source='auto', target='it').translate(phrase_to_translate)
print("In Italian: ", translated_to_italian)
Output for the code above

Or Japanese …

# Translating to Japanese
translated_to_japanese = GoogleTranslator(source='auto', target='ja').translate(phrase_to_translate)
print("In Japanese: ", translated_to_japanese)
Output for the code above

Case 2: Translating Multiple Paragraphs from a News Article

Let’s refine our script to handle multiple paragraphs, providing a more comprehensive translation of content from web pages. Let’s create a simple web-scraping code and test translation there.

Fetching and Parsing Web Content for Multiple Paragraphs

Continuing in your Jupyter Notebook, replace the previous web scraping and parsing cell with the following code. This version gathers several paragraphs from the article. I took some random article for the German magazine “Der Spiegel” about some banking news.

import requests
from bs4 import BeautifulSoup

# URL of the article you want to scrape
article_url = 'https://www.spiegel.de/wirtschaft/unternehmen/effenberg-bank-pleite-der-volksbank-schmalkalden-ist-vorerst-abgewendet-a-a1d87af9-69d9-4f09-97b7-c59d53c04458'

# Fetching the content from the URL
response = requests.get(article_url)
web_content = response.text

# Parsing the fetched content using BeautifulSoup
soup = BeautifulSoup(web_content, 'html.parser')

# Selecting all paragraph elements
paragraphs = soup.find_all('p')

# Combining the text of each paragraph into one string
text_to_translate = ' '.join([para.text for para in paragraphs])

print("Extracted Text: ", text_to_translate[:1000], "...")
Output for the code above

This script selects all paragraphs (<p>) from the webpage, combines their text into a single string, and prints the first 1000 characters to ensure the output is manageable for illustrative purposes.

Translating the Combined Text

Next, translate the combined text using the deep_translator library. You can do this in the same cell or a new one:

# Translating the extracted text to English
translated_text = GoogleTranslator(source='auto', target='en').translate(text_to_translate)

print("Translated Text: ", translated_text)
Output for the code above

As you see, this modification enables the translation of an article’s multiple paragraphs into English, providing a more complete understanding of the content.

Some considerations

  • Length Limit: Be mindful of potential API limits on the length of text you can translate in one go. If you encounter issues or errors related to text length, you might need to split the text and translate it in chunks.
  • Website Structure Variability: The structure of HTML can vary significantly from one website to another. You may need to adjust the BeautifulSoup selector (soup.find_all('p')) based on the specific markup of the news site you're scraping.

Full code on GitHub is here https://github.com/mithridata-com/automatic_translation/blob/6c380934b5c197f611caf41e5c565c0d50e3d7b8/deep_translator_showcase.ipynb

Wrap-up

By translating multiple paragraphs, you gain a fuller insight into the article’s content, making this technique particularly useful for research, news following, or content aggregation where understanding the full context is crucial.

Author’s AI-based picture

Please clap 👏 and subscribe if you want to support me. Thanks!❤️‍🔥

--

--

Balancing passion with reason. In pursuit of better decision making in economic analysis and finance with data science via R+Python