Let’s build a search engine with python
!!! For more fun tutorials like this visit www.bytexplain.com.
In this tutorial, we will be building a search engine with python. Here are a few screenshots of the final products.
What is a search engine?
A search engine is a software program that helps users locate information in the worldwide web. A search engine normally has three basic steps which are crawling, indexing and ranking. The crawling stage is where the program crawls the web in a predefined method collecting data such as images, links and stuff. The indexing stage is where the collected data is then stored in a data structure and finally, we have the ranking stage where the collected data is then ranked by relevance in that the higher the ranking the more accurate the answer.
We will be building our search engine on top of the django web framework.
Disclaimer !!! We will be building our search engine on top of the lycos search engine.
Python packages for the project;
- beautifulsoup
- html5lib
- requests
- django
To download these packages use pip3 install <package name>
To begin with type django-admin startproject python_search_engine on the terminal. CD into the python_search_engine directory. Run python3 manage.py startapp engine this will create an app called an engine.
Now we will need to configure our django project. Add these lines in the python_search_engine settings.py file.
Now lets modify our urls.py file.
The basic functionality of our search engine relies on web scrapping. We will be scrapping lycos.com to obtain the search results for our queries.
To begin we will create a few files and folders. In the engine folder create a folder called templates inside the templates folder create a folder called engine. Inside the engine folder create the following files ; base.html ,home.html, results.html and about.html .
The base.html contains the basic layout of our project. Here is our base template.
Now let’s create our views. In the engine views.py add the following.
This view will render our home page where we will enter our search query.
Here is the html template.
To build our scrapper we will be using lycos.com as our base search engine. To begin we inspect the basic results page from the website here is a screenshot of a title inspection.
Next we inspect the description
We then see that they have used result-description as the description class, result-title as the title class and result-url as the URL class. This will be our building block for our web scrapper. In case you are wondering why we used lycos it’s because they designed their user interface simply thus making it easy for us to scrape compared to google or bing which was almost impossible for me to scrape. Now let us proceed to make our scrapper with this valuable information at hand.
Now let’s create our view for the search.
First of all, let’s import a few libraries.
Next, we move to our results view.
Now let’s go step by step. We first request the search form from our template that we are going to create in a second.
We insert the url for lycos search which we then append our query from the search bar. Next we insert container class for the search results from lycos.com which contains the result title, result url and result description.
Here is our home page template.
Next, we will iterate over the results and then append them to a list.
We then pass the results to the results.html template.
Here is the results.html template.
Here we inherit our template from the base.html template. We have our search bar at the top and then we render our results below.
We access items from results list using their respective index in which we appended with earlier.
Next we create an about page which will give a bit of information about our search engine.
Here is our about view.
Next we create our about template.
Now let’s create our routes
Create a file urls.py in the engine folder.
Append the following lines.
Our project is now done , to fire it up type python3 manage.py runserver enter this url in your browser and you should see this.
Now enter your query in the search bar and your should get your results like this.
Thanks for reading. You can check the source code here. You can also support us by donating here.