Wednesday, 20 August 2014

GSOC : Final Update

After months of scrounging the xlators for getting to know how they work, we've finally come to an end with my glusterfsiostat project. You can check out the latest source code from https://forge.gluster.org/glusterfsiostat.

I've worked on a fast pace over the past couple of days and completed all the tasks left out. This includes finishing up the python web server script which now supports displaying of live I/O graphs for multiple Gluster volumes mounted at the same time. The server and the primary stat scripts now also generate error if profiling is not turned on for any volume. Following are some screenshots of the server live in action.

 
  



The approach here is quite similar to Justin Clift's tool(https://github.com/justinclift/glusterflow) but I've tried to build this as a bare bones package, since Justin's tool requires you to first setup an extensive stack (Elastic Search server, logstash etc.). My aim is that the contents of this tool should be self sufficient for anyone to download once and use it, not to complete dependencies first. The response from my mentor about the work done has been pretty supportive. I look forward to improving this project and working on some more exciting ones with GlusterFS in the future.

Wednesday, 23 July 2014

GSOC Week 8+9 : To be or not to be

These past two weeks, I've been busy since my college re-opened and I spent my past weekend coding away in an overnight hackathon. As instructed by my mentor, I spent this week testing my recent patch whether enabling I/O profiling always in io-stats really degrades io-performance or not.

For this, I performed two write tests, one with a 20 MB file and the other with a 730 MB file. Each file was written 20 times to the mounted volume after clearing the buffers on every iteration and the time taken measured with the time command. Since the values at different times for writing the same file are quite varied, I plotted a graph using the obtained values(Y-axis represents seconds). As you might see in these images, there is no clear pattern found in the variation of values obtained while writing.




So according to me, values in both the conditions are quite near to each other and equally capable of going quite high or low than the mean value and hence, there is no negative effect seen due to the change proposed. You can follow this discussion on the ML at http://supercolony.gluster.org/pipermail/gluster-devel/2014-July/041756.html

Monday, 7 July 2014

GSOC Week 7 : Back on track

It's time to get back on track. Passing the midterms with supposedly good flying colors was really great. I apologize for my tardiness during the last two weeks for unable to post any update regarding my progress, owing to the fact of me not feeling very well during this time.

The progress till now includes re-thinking of the previous patch and the methodology io-stats will use to dump the private info. As suggested by my mentor, I'm moving the job of speed calculation and other major work to the glusterfsiostat script rather than code it all in the glusterfs codebase. You can look at the new patch here : http://review.gluster.org/#/c/8244/.

Also, my project was accepted to be hosted on Gluster Forge at https://forge.gluster.org/glusterfsiostat where you can track the progress for the python script and rest of the code base related to my project.

Recently, my mentor and me have started to track our progress with the help of Scrum model, by using trello. This helps us break the bigger jobs into smaller tasks and set the deadline on each of them to better estimate their supposed date of completion.

Tuesday, 17 June 2014

GSOC Week 4: "This is not a coding contest"

Basing my first patch to Gluster as a stepping stone, I've written a small utility glusterfsiostat, in python which can be found at https://github.com/vipulnayyar/gsoc2014_gluster/blob/master/stat.py. Currently, the modifications done by my patch to io-stats which is under review as of now, dumps private information from the xlator object to the proper file for private info in the meta directory. This includes total bytes read/written along with read/write speed in the previous 10 seconds. The speed at every 1 second is identified by it's respective unix timestamp and hence given out in bytes/second. These values at discrete points of time can be used to generate a graph. 

The python tool first identifies all gluster mounts in the system, identifies the mount path and parses the meta xlator output in order to generate output similar to the iostat tool. Passing '-j' option gives you extra information in a consumable json format. By default, the tool pretty prints the basic stats which are human readable. This tool is supposed to be a framework on which other applications can be built upon. I've currently put this out in the gluster devel ML for community feedback so as to improve it further.

Note: In order to test this, you need to apply my patch(http://review.gluster.org/#/c/8030/) in your repo first, build and then mount a volume. Preferably perform a big read/write operation with a file on your Gluster mount before executing the python script. Then run it as 'python stat.py' or 'python stat.py -j'

Quoting my mentor Krishnan from our latest weekly hangout, "This is not a coding contest". What he meant was that just writing the code and pushing it is not the essence of open source development. I still need to interact and gain more feedback from the community regarding the work I've done till now, since our aim is not to complete the project for just the sake of doing it, but to build something that people actually use.

Tuesday, 10 June 2014

GSOC Week 3: Patches Ahoy!!

This week, I focused on modifying the io-stats xlator so that it has the capability to store the speeds of recent reads and writes. Another information needed by my tool, i.e the amount of data read and written is stored in the private section of the xlator object(this->private). The meta xlator has a feature to custom dump the private info of an xlator, but for that it requires an initialized dumpops structure just like the fops one.

So for io-stats, I initialized dumpops and added the definition of the custom dump function(.priv) which is called by meta when doing a `cat profile` in the .meta folder. In order to store the read/write speeds, two separate agnostic doubly linked lists are used which store a max of 10 elements at a time. Each element represents a 1 second interval and stores the data in bytes read/wriiten during that duration and this element is uniquely identified by their respective unix timestamps(seconds). The following read_speed and write_speed fields in the file represent speed in bytes/sec for an interval of 1 sec which can be identified by the unix timestamp in the parentheses.
/mnt/.meta/graphs/active/newvol 
[root@myfedora newvol]# cat private
write_speed(1402602084) = 12552555
write_speed(1402602085) = 18558756
write_speed(1402602086) = 23685425
write_speed(1402602087) = 9786084
write_speed(1402602088) = 9543367
write_speed(1402602089) = 796957833
write_speed(1402602090) = 8530576722
write_speed(1402602091) = 10028056272
write_speed(1402602092) = 10719120525
write_speed(1402602093) = 10528354767
read_speed(1402602084) = 8961522
read_speed(1402602085) = 8082654
read_speed(1402602086) = 7617477
read_speed(1402602087) = 9810846
read_speed(1402602088) = 10258556
read_speed(1402602089) = 193668615
read_speed(1402602090) = 261608047
read_speed(1402602091) = 29639965
read_speed(1402602092) = 47595000
read_speed(1402602093) = 39282929
data_read_cumulative = 729737216
data_read_incremental = 729737216
data_written_cumulative = 729737216
data_written_incremental = 729737216

This patch is under review as of now. It currently produces a successful build with rpms and smoke tests with Jenkins . Also, the new rackspace regression test gives a success build too.    

Tuesday, 3 June 2014

GSOC Week 2 : Bootstrapping

Based on recent discussions with my mentor KP, we've decided that instead of looking at the bigger picture of modifying multiple xlators, it's a better alternative to stick to the primary objectives of my proposal and start working on a 0.1 version of the application so as to present it soon in front of the Gluster community. Feedback is very important for our application since it's intended users will be the developers and Gluster users.

I've decided to go the python way for building glusterfsiostat. The further tasks needed to build this kind of thing are:
  1. Get location of every gluster mount in the system. This will be done with the help of the  mount command output by applying a bit of regex parsing. For now, I'll only be focusing on Unix based mount paths.
  2. Every glusterfs mount path contains a virtual directory called .meta which stores the meta information(name, type, structure of graph, latency) for every xlator. Reading the files in .meta will be my primary source of information. As of now, I could only find latency information for each of the 49 fops defined in Gluster, but in order to display total read/write amount and speed like the tool nfsiostat, I may need to modify the meta xlator itself.
  3. Pretty print this output with the application, or in a consumable form like JSON.
This initial version of the application might undergo many changes based on feedback from the community, and I'm ready for that because we're looking for a long haul!!

Monday, 26 May 2014

GSOC Week 1 : Getting hands dirty

The primary job in the coming days according to the timeline suggested in the accepted proposal is to identify the counters present in different xlators in the Gluster's inverted graph and standardize the stats that would be available only at one end point in the application. 

With this regard, Anand Avati, a longtime contributor to Gluster has advised me to look into the meta xlator which is already storing some stats like latency measurement and fop count. The meta xlator lies at the top of the graph, hence it's the ideal location to gather overall latency of a request given to gluster. Based on my experience in testing meta, I've found that meta xlator when realised on a running mounted volume, consists purely of virtual files. In order to enable latency measurement, you have to do "echo 1 > $MNT/.meta/measure_latency". Don't forget to put a '-n' after echo other wise the file receives an input of "1\n" which leads to an incorrect write. I realised this after spending quite some time evaluating the variable contents by debugging glusterfs with gdb. A redacted output of this can be found at http://fpaste.org/104770/

So the basic idea right now is to read up all the xlators and document which stats each of it is storing and hence provide access to this information in a standardized manner.