Sunday, June 4, 2017

User interface for Python Power Electronics

Though this is related to my circuit simulator Python Power Electronics, it is related to how Django and web interface is used which makes it more relevant in this blog. I released another version of my circuit simulator but this time the user interface is a web interface rather than the usual command line. The objective is to make the simulator more interactive and easier to use.

To begin with, I will describe the basic philosophy of using Django as a user interface. This is a concept which I don't think will be accepted by the main stream Django community as Django was never meant to be used as a user interface but as a web application to design a website that could be driven by a database. However, many of the features of Django make it very suitable for a user interface.

To begin with, any software GUI will have a few basic menu options on the start-up window which I have designed with Django's urls.py file. Every webpage has the header links that are similar to a standard GUI - browsing the simulation library, creating a new simulation, documentation (or help) and a contact page. This has been built into a base framework template file which has been extended by every other HTML template file. Clicking on a link will send you to a URL which in turn appears in the urls.py file and directs to a function which in turn renders another HTML page. This way a user can move around the software just like a GUI.

Typically, a simulation software GUI will allow you to load a simulation case which is a file stored on the user's computer. In the web app, every simulation case is a database entry stored in the database simulation_collection. The uppermost table is called SimulationCase. This table contains the title, description and parameters of every simulation created by a user. When the user clicks on the "Simulation library" link, all the entries in the table SimulationCase are listed out for the user to load any one.

Each SimulationCase entry is linked to several other tables. The first level of relationship is below:

SimulationCase
-----> CircuitSchematics
-----> CircuitComponents
-----> MeterComponents
-----> ControllableComponents
-----> PlotLines
-----> CircuitPlot
-----> ControlFile

These tables are linked to a SimulationCase as ForeignKey relationships as a single simulation case can have a large number of them. As an example, a simulation could have 10 circuit schematic spreadsheets, 100 circuit components altogether in all schematics, 20 meters, 15 controllable components, 45 elements that are to be written to the output data file and made available for user plotting, 30 circuit plots, and 5 control files. Such a hierarchy makes it convenient to segregate data and relate them in a logical manner which is useful particularly in creating forms with the models from ModelForm.

While creating a new simulation, it starts with a single new database entry for a SimulationCase with parameters. The user then adds CircuitSchematics. That results in a new database entry with the circuit file that is linked to the SimulationCase entry. From all the components in the circuit schematics, database entries are made for CircuitComponents, MeterComponents, ControllableComponents and these are also linked to the SimulationCase.

The next level of hierarchy is as follows:

CircuitSchematics
--------> Resistor
--------> VariableResistor
--------> Inductor
--------> VariableInductor
--------> Capacitor
--------> Voltage_Source
--------> Controlled_Voltage_Source
--------> Ammeter
--------> Voltmeter
--------> Diode
--------> Switch

The simulator will look for components in the circuit schematic spreadsheets and on finding a component will create a database entry in the table corresponding to the type of object. The objective behind separating the components into their respective types and having separate tables for each type was to use the ModelForm to create forms for each type rather than create a single component type. This results in customized forms, error checking and feedback messages.

CircuitPlot
--------> CircuitWaveforms

When the user creates a new circuit plot, a new data base entry in the table CircuitPlot is created. Each Circuit Plot can have numerous waveforms. When a waveform is added to a CircuitPlot, a new entry is created in the table CircuitWaveforms and linked to the entry in CircuitPlot. There is another layer:

CircuitWaveforms
--------> PlotLines

Every simulation case will have a number of data items that will be written to the output data file. These are called PlotLines. They may be meter outputs or VariableStorage elements in control files. The user can choose which PlotLine will appear in a CircuitWaveform. A CircuitWaveform can have several PlotLines and conversely a PlotLine can appear in several CircuitWaveforms. This results in a ManyToMany relationship.

ControlFile
--------> ControlInputs
--------> ControlOutputs
--------> ControlStaticVariable
--------> ControlTimeEvent

SimulationCase
--------> ControlVariableStorage

These are the input/output ports and the special variables of a control file. When a user adds a control file to a simulation case, a new entry is created in the table ControlFile. This control file entry can be configured. The user can add inputs, outputs, static variables, time events to a control file that can be used in the control code. Variable storage elements are global variables and therefore are related to the simulation case rather than a control file.

Many of these database entries are dynamic - the user can create and delete them. In some cases, the simulator creates the entries in which case they are created and deleted when the simulation is run when all the circuit files are processed. When a simulation is loaded, all data items related to the SimulationCase are also loaded. This is similar to the GUI based circuit simulators which will present the circuit in the latest state when a file is opened.

Thursday, March 16, 2017

Sequence in ManyToMany fields

Up till now I had designed the database with Paper being a class that had a ManyToMany field connected to the Author class. So essentially a paper will have multiple authors and an author will have several papers. The concept works except for one problem. Defining a ManyToMany relationship in the Paper class in the following manner:

paper_authors = models.ManyToManyField(Author)

Allows you to add multiple Author objects to objects of the class Paper. However, adding authors in a particular sequence does not guarantee that sequence will be maintained. If a query is performed:

paper.paper_authors.all()

The Author objects in the object paper will be extracted randomly from the database. The only way to solve this as I could see from the answer to a question I posted on Stack Overflow is:

http://stackoverflow.com/questions/42741591/order-of-manytomany-field-in-model-changed-when-one-object-is-replaced

What was suggested is that I define a membership class and use the "through" attribute (click view raw at the bottom to see code in a new window).



Now the ManyToMany field has the following definition:

paper_authors = models.ManyToManyField(Author, through = 'Contributor')

It uses a membership class using the "through" attribute to define additional details about how Author is related to Paper. From the above code, now the Contributor class has a "position" defined which designates the author's position in the paper. Also, before, authors could be added by using the add function as:

paper_Y.paper_authors.add(X)

However, now the membership object has to be defined as:

xy = Contributor(paper=paper_Y, author=X,position=1)
xy.save()

With this change made, the functions in views.py have been changed and now a paper can be edited to change the authors without losing the sequence.

Now that a basic database has been created, I'll tinker around a little to make sure I haven't missed anything and then I'll create a better set of web pages to make it easier to navigate this application.

Friday, February 24, 2017

Models and interconnected data

With a basic server setup, the next part is getting the data in. Since, this is a database of publications, the most important table is going to be that of papers. Since the idea is to build a linked database, it would be best if I can create tables for the most searched for parts of every paper. For example, in a paper the important fields are - title, authors, journal, volume, number, pages, keywords, abstract. Out of these, authors and journal, are separate data that can be linked back to the paper. For example, if I want to get a list of paper published by Author X or the list of papers published by authors X and Y, it would be much faster if there was a separate table of authors that could be linked to the table of papers.

With this basic idea of a database in mind,I have created the following hierarchy. A paper will have the essential field as title without which it can't be created. The other fields of volume, number, pages, month, year are optional as sometimes they can't be found. A paper can have multiple authors and authors have multiple papers. So authors will be a ManyToManyField which is an external relationship to another model. As for journal, a paper will belong to one journal but a journal will have several papers. Therefore, the journal field in the paper database will be a many-to-one relationship and that means it will be a ForeignKey.

Therefore, the basic structure of models in models.py will be:



With this database structure, the idea is to populate the database automatically from the data extracted from the BibTex file. For that the following view function is written (click on view raw at the bottom right to view in separate window):



The above view function is fairly self explanatory in that it checks if a field exits in the BibTex entry for a paper. If the entry is found, it is added to the model. So, if an entry is missing, for example, there is no abstract, it wouldn't be a problem. The only catch was how to deal with the ForeignKey and ManyToManyFields. A ForeignKey field meant that there was an item (in this case journal) that was external to the database. A paper could belong to only one Journal but a Journal could have many papers. So, to be able to save even a basic definition of Paper, it was essential to relate a Journal.

For example, what is found in the code is (click on view raw at the bottom right to view in separate window):

        new_paper_entry = Paper()
        new_paper_entry.paper_title = paper_item["title"]
        new_paper_entry.paper_journal = new_journal_entry
        new_paper_entry.save()
        for author_in_paper in list_of_authors_in_paper:
            new_paper_entry.paper_authors.add(author_in_paper)
            new_paper_entry.save()

To be able to perform
new_paper_entry.save()

It was essential to define:
new_paper_entry.paper_journal = new_journal_entry

If I did:
new_paper_entry = Paper()
new_paper_entry.save()

It would have given an error that the Paper database doesn't have a Journal entry. This is because each paper has a single journal and therefore to be able to save a valid iteration of the database, a journal assignment was necessary.

At the same time, the save command was essential after the journal assignment:
new_paper_entry.save()

Because the paper has the ManyToManyField - Authors. The paper cannot assign a many to many field assignment unless it exists in the database. And this happens only after the save() function.

It took me a day to figure this out. But was a good learning. When a ForeignKey field exists, it must be defined to be able to save the model. Unless this fields can be made optional. Also, to be able to define a many-to-many field, the model must be saved in the database.
 

Sunday, February 19, 2017

Templates

So far I have not done much template rendering except listing all the papers in the BibTex file. But before moving on to more complex stuff, I am trying to read as much as possible.

To begin with Django used the concept of loose coupling - URLs, views and data. With the URLconf list in urls.py, Django specifies which URL will call which function in views.py. Therefore, it is possible that the functionality of a URL can be changed by changing the view function without affecting any other part of the code. The view function on the other hand can access the database and render a template while passing the necessary data extracted from the database or from the URL. The template which should be rendered can be changed in the view function without changing any other part of the code. Finally, models.py specifies the structure of the database that can be changed independent of the views or the URLs. Of course, functions in views.py have to be designed flexibly enough the be able to adapt to changes in the database and the URLs.

Within the view functions, I have been reading about templates and contexts. The simplest way to generate a display on a webpage is using the HttpResponse() function. As an example:

return HttpResponse("Hello world")

will display Hello World in a webpage corresponding to the URL that points to the function with the above return statement. But to do more complex stuff, you would need a separate HTML file. This again is in alignment with the concept of loose coupling. The contents of the webpage should be separate from the view function that acts as the buffer between the URL and the database.

Suppose a separate HTML file was to exist in the templates folder in the application folder paperarchive/papercollection. This is the default directory that Django will search for templates when the 'APP_DIRS'=True is set in TEMPLATES variable in settings.py. The other option is to specify a list of directories in DIRS in the same variable. The conventional way to load this HTML file is with the get_template function in django.template.loader. So, suppose:

from django.template.loader import get_template
t = get_template("my_html.html")

is present in a view function, the template object t will be created with the contents of the HTML file. This HTML file could be a simple "Hello world" display as before or could be more complicated with variables called template tags and a bit of programming to deal with these template tags.

Since, variables are present in the template, they need data. The data is in the form of a dictionary with the keys being the variable names in the HTML file. This dictionary of variables is the context. So,

from django.template import Context
c = Context({"name": "Django"})

Will create a context object with the variable "name" being "Django". To pass this data to the HTML file, the template object that was created with the HTML file is rendered with this context by:

t.render(c)

When the view function returns the above template with the context,

return t.render(c)

The webpage is displayed with the data we specified. This concept is fairly convenient as the HTML file can be a regular HTML file with some amount of programming in the form of template tags. The view function can change the variables that are needed by the HTML file by extracting from the database or from user entered data in forms using the HTTP request object "request".

To simplify the above process, there are two functions in Django in django.shortcuts - render and render_to_response. They are similar but render_to_response is being discouraged as it may be discontinued later. The above process of creating a template object and rendering it with a context can be performed in one line as:

render(request, "my_html.html" , {"name": "django"})

or

render_to_response("my_html.html" , {"name": "django"})

Only difference is render needs the request object to be the first argument while render_to_response doesn't.

Additionally, these two functions also provide the possibility of context_processors. Instead of just the template and the context, a RequestContext object can be passed as a context_instance. So,

render(request, "my_html.html" , {"name": "django"} ,
context_instance = RequestContext(request [, context dictionary]
[, processors = <custom_processors>])
)

I took some time reading back and forth about this. RequestContext takes the request object as the first argument and will generate a context object that contains global variables that Django provides by default to save you the trouble from writing code. For example, context data about the user logged in etc. Check out the "context_processors" list in TEMPLATES variables in settings.py. This list contains the default global context processors. A context processor is a function that returns a dictionary which becomes the context and takes the request object as the only single argument. So the default global processors in the settings.py file are functions that are automatically added when a RequestContext function appears anywhere in a view function and these provide as context data that a user can conveniently use for a number of reasons like user authentication etc. Additionally, to the RequestContext function can be added custom processors which the user specifically has designed. The only requirement as before is that these custom processors should take as an argument the request object and return a dictionary as context data. The only catch in using the RequestContext seems to be that a number of context variables will be provided to the template that may not be needed as it calls all the context processors listed in the settings.py file.

So, the Context() function specifies data while RequestContext() function requests data and also allows you to add custom processors which add their data. So choosing to use Context() or RequestContext() seems to depend only on whether the user needs those global data that Django automatically generates or whether the user wishes to call other user-defined context processors that return code specific context data.

Saturday, February 18, 2017

Setting up the development server

To get started with building the database, I will run the default development server with SQLite that ships with Django. I created a repository on Github:

https://github.com/shivkiyer/publications_db

With the simplest of Django commands this means, creating a project with:

django-admin startproject paperarchive

So paperarchive is the parent folder which can also be found on the Github repository. This folder contains another folder called paperarchive which has the settings.py, urls.py and wsgi.py. the file settings.py will be changed a couple of time and urls.py will changed repeatedly.

My app will be called papercollection. So, inside the parent folder paperarchive, using the manage.py script:

python manage.py startapp papercollection

This creates another folder called papercollection below the parent paperarchive folder that contains models.py, views.py, admin.py (which I don't intend to use), tests.py (which I won't use right now). For now no need to worry about models.py as I only need to run the server.

But first, let's get going with the Python code to extract a BibTex file. A sample BibTex with around 30 publications has been copied from the IEEE Xplore website on to the file input_data_file.txt. A few sample BibTex entries are as follows (click on view raw to see in a new window):



The sample file shows three publication entries in the BibTex format. However, it should be quickly noticeable that BibTex entries can have differences though they are fairly similar. For example, almost all of them have the fields title, author, year, month, volume, number, abstract and keywords. However, for publications in journals, the name of the journal is specified as "journal" while for a conference, the name of the conference is called "booktitle". There may be other versions that I have not encountered so far and in that case the code will be modified later. Also, in some cases, the values of the fields are enclosed in quotes and in other cases in curly brackets {}. Latex compiles both so our code will have to also.

The code can be found in backup_data.py (click on view raw to see in a new window)



The file contains a function that reads the text file containing BibTex references, scrubs them to remove special characters used by BibTex/Latex. It then splits every line with the "=" sign as a separator as the BibTex fields are key = value entries. The first item is compared with a known list and those that are needed are added to a dictionary object. The dictionary object is finally added to a list. This list now contains all the publication information in pain text form that can be displayed using an HTML file.

To display an HTML file, I configured urls.py in paperarchive/paperarchive folder (click on view raw to see in a new window):




To get started, I defined two urls - /start-db/ and /display-db/.



The url /start-db/ points to the function db_populate in views.py. This function calls the function that reads the BibTex file. To display these publications, we use the render_to_response shortcut that will load a template which in this case is list_paper.html with the context being the list of publications extracted. At this stage, the context_instance is probably not needed as the request object received is not being used. But we pass it as an argument with render_to_response anyway as I'll be doing more advanced stuff soon.

The list_papers.html file can be found in the templates folder in paperarchive/papercollection/. A very basic HTML file is as follows (click on view raw to see in a new window):



This uses template tags to extract the dictionary items in each publication and list them. The result can be viewed by checking out the link:

http://127.0.0.1:8000/start-db/

So with this, I got a very basic Django server going and very basic extraction script that displays the papers in the Bibtex file on a webpage. The next step is to insert this into a database and create links and forms for the user to be able to edit them.

The beginning

I have been programming with Python for almost five years while building my circuit simulator Python Power Electronics. I tried out Django a couple of times for small web related projects like creating my own blog. But mainly out of curiosity since Django is one of the most popular frameworks built using Python. And it is only obvious that Django preserves the fundamental elegance of Python in allowing a web developer to build web apps efficiently with beautiful code.

So, with an interesting idea for a web app, I am now going to dive into Django the way I did with Python. As someone who wrote a Masters and a PhD thesis, I am well aware of the mess that cross-referencing research articles causes. To describe how research progressed on the topic you are writing on requires citing publications in a number of different ways, chronologically, how they are linked together and how they differ. There are a number of software for this, so a researcher is not without any tools. But as with my circuit simulator, I would like to build an application specifically to my tastes.

So this is the plan. Many researchers in engineering and science using LaTex for documentation. Latex uses BibTex for generating bibliography. BibTex collects references in a separate BibTex file (.bib) in a special format. When I wrote my thesis, this format had to be manually prepared, which was a bit of a pain. But the advantage was that you could add any article as an item to the .bib file but only those articles will appear that have been cited in the publication or thesis that you are writing. And BibTex takes care of the order in which they appear based on the order in which they are cited. So the chances are pretty slim that you would be referring to an article that doesn't appear on your list of references or that the list of references contains articles that have never been cited.

Now, most journals will provide the citation information for publications in BibTex format. Researchers don't have to manually prepare them. You just export them from the publication link. This makes it fairly convenient to generate a long list of references as all you need to do is click on the "Export citation" button and copy the BibTex entry that appears on a new window. As an example, I could generate a list of 30 BibTex references in less than an hour while this took me days when I was writing my thesis several years back. The only drawback is that the final pdf file that is generated by compiling these list of references will not provide much insight that is useful while cross-referencing or performing literature survey.

So, my plan is to take these BibTex files and insert them in a database. For this I will use Django. So the database will have a number of fields for title, name of journal/conference, authors, year etc. Eventually, the idea is to link these publications together using several categories - chronologically, according to authors, who cited whom, who collaborated with whom etc. The results of these search strings will produce a networked list of articles that will be much more useful for writing a literature survey or while cross-referencing.

The work has already started though I am continuously learning about Django at the same time. The work will be hosted on GitHub and code with description will be posted here. So stay tuned if linked databases are something that interests you. I hope learning Django and blogging about it will be as much fun as it was with my circuit simulator.