Saturday, February 18, 2017

Setting up the development server

To get started with building the database, I will run the default development server with SQLite that ships with Django. I created a repository on Github:

With the simplest of Django commands this means, creating a project with:

django-admin startproject paperarchive

So paperarchive is the parent folder which can also be found on the Github repository. This folder contains another folder called paperarchive which has the, and the file will be changed a couple of time and will changed repeatedly.

My app will be called papercollection. So, inside the parent folder paperarchive, using the script:

python startapp papercollection

This creates another folder called papercollection below the parent paperarchive folder that contains,, (which I don't intend to use), (which I won't use right now). For now no need to worry about as I only need to run the server.

But first, let's get going with the Python code to extract a BibTex file. A sample BibTex with around 30 publications has been copied from the IEEE Xplore website on to the file input_data_file.txt. A few sample BibTex entries are as follows (click on view raw to see in a new window):

The sample file shows three publication entries in the BibTex format. However, it should be quickly noticeable that BibTex entries can have differences though they are fairly similar. For example, almost all of them have the fields title, author, year, month, volume, number, abstract and keywords. However, for publications in journals, the name of the journal is specified as "journal" while for a conference, the name of the conference is called "booktitle". There may be other versions that I have not encountered so far and in that case the code will be modified later. Also, in some cases, the values of the fields are enclosed in quotes and in other cases in curly brackets {}. Latex compiles both so our code will have to also.

The code can be found in (click on view raw to see in a new window)

The file contains a function that reads the text file containing BibTex references, scrubs them to remove special characters used by BibTex/Latex. It then splits every line with the "=" sign as a separator as the BibTex fields are key = value entries. The first item is compared with a known list and those that are needed are added to a dictionary object. The dictionary object is finally added to a list. This list now contains all the publication information in pain text form that can be displayed using an HTML file.

To display an HTML file, I configured in paperarchive/paperarchive folder (click on view raw to see in a new window):

To get started, I defined two urls - /start-db/ and /display-db/.

The url /start-db/ points to the function db_populate in This function calls the function that reads the BibTex file. To display these publications, we use the render_to_response shortcut that will load a template which in this case is list_paper.html with the context being the list of publications extracted. At this stage, the context_instance is probably not needed as the request object received is not being used. But we pass it as an argument with render_to_response anyway as I'll be doing more advanced stuff soon.

The list_papers.html file can be found in the templates folder in paperarchive/papercollection/. A very basic HTML file is as follows (click on view raw to see in a new window):

This uses template tags to extract the dictionary items in each publication and list them. The result can be viewed by checking out the link:

So with this, I got a very basic Django server going and very basic extraction script that displays the papers in the Bibtex file on a webpage. The next step is to insert this into a database and create links and forms for the user to be able to edit them.

No comments:

Post a Comment