Programming

Features of Apache Big Data Streaming Frameworks

Posted on

The idea of this article is to explain the general features of Apache big data  stream processing frameworks.  Also provide a crisp comparative analysis of Apache’s big data streaming frameworks against the generic features.  So that it useful to select the right framework for the application development.

1. Introduction

In the big data world, there are many tools and frameworks available to process the large volume of data in offline mode or batch mode.  But the need for real time processing to analyze the data arriving at high velocity on the fly and provide analytics or enrichment services is also high.  In the last couple of year this is an ever changing landscape, with many new entrants of streaming frameworks.  So choosing the real time processing engine becomes a challenge.

2. Design

The real time streaming engines interacts with stream or messaging frameworks such as Apache Kafka, RabbitMQ, Apache Flume to receive the data in real time.

It process the data inside the cluster computing engine which typically runs on top of a cluster manager such as Apache YARN, Apache Mesos or Apache Tez.

The processed data sent back to message queues ( Apache Kafka, RabbitMQ, Flume) or written into storage such as HDFS, NFS.

StreamingFrameworkDesign

 

3. Characteristics of Real Time Stream Process Engines

3.1 Programming Models

They are two types of programming models present in real time streaming frameworks.

3.1.1 Compositional

This approach provides basic components, using which the streaming application can be created. For example, In Apache Storm, the spout is used to connect to different sources and receive the data and bolts are used to process the received data.

3.1.2. Declarative

This is more of a functional programming approach, where the framework allows us to define higher order functions. This declarative APIs provides more advanced operations like windowing or state management & it is considered more flexible.

3.2  Message Delivery Guarantee

There are three message delivery guarantee mechanisms. They are : at most once, at least
once & exactly once.

3.2.1 At most once

This is a best effort delivery mechanism. The message may be delivered one or more times.  So the possibilities of getting duplicate events processed are very high.

3.2.2 At least once

This mechanism will ensure that the message is delivered at-least once.
But in the process of delivering at least once, the framework might deliver the
message more than once. So, duplicate message might be received and processed.
This might result in unnecessary complications, where the processing logic is not
omnipotent.

3.2.3 Exactly once

The framework will ensure that the message is delivered and processed exactly once.
The message delivery is guaranteed and there won’t be any duplicate messages.
So, “Exactly Once” delivery guarantee is considered to be best of all.

3.3 State Management

Statement management defines the way events are accumulated in side the frameworks  before it actually process the data.  This is a critical factor while deciding the framework for real time analytics.

3.3.1 Stateless processing

The frameworks which process the incoming events independently without the knowledge
of any previous events are considered to be stateless.  The data enrichment and data processing applications might need kind of processing power.

3.3.2 Stateful Processing

The stream processing frameworks can make use of the previous events to process the
incoming events, by storing them in cache or external databases.  Real time analytics applications need stateful processing, so that it can collect the data for a specific interval and process them before it really recommends any suggestions to the user.

3.4 Processing Modes

Processing mode defines, how the incoming data is processed.  There are three processing modes: Event, Micro batch & batch.

3.4.1 . Event Mode

Each and every incoming message is processed independently. It may or may not maintain the state information.

3.4.2 Micro Batch

The incoming events are accumulated for a specific time window and the collected events processed together as batch.

3.4.3 Batch

The incoming events are processed like a bounded stream of inputs.
This allows to process the large finite set of incoming events.

3.5 Cluster Manager

The real time processing frameworks runs in cluster computing environment might need a cluster manager.  The support for cluster manager is critical to support the scalability and performance requirement of the application.  The frameworks might run on standalone mode, their own cluster manager, Apache YARN, Apache Mesos or Apache Tez.

3.5.1 Standalone Mode

The support to run on standalone mode is useful during development phase, where the developers can run the code in their development environment, they do not need to deploy their code in the large cluster computing environment.

3.5.2 Proprietary Cluster Manager

Some of real time processing frameworks might support their own cluster managers, such Apache Spark has its own Standalone Cluster manager, which is bundled with the software.  This reduces the overhead of installing , configuration and maintenance of other cluster managers such as Apache Yarn or Apache Mesos.

3.5.3 Support for Industry Standard Cluster Managers

If you already have a big data environment and want to leverage the cluster for real time processing, then support to existing cluster computing manager is very critical.  The real time stream processing frameworks must support Apache YARN, Apache Mesos or Apache Tez.

3.6 Fault Tolerance

Most of the Big Data frameworks follows master slave architecture.  Basically the master is responsible for running the job on the cluster and monitor the clients in the cluster.  So, the framework must handle failures at the master node as well as failure in client nodes.  Some frameworks might need some external tools like monit/supervisord to monitor the master node.  For example, Apache Spark streaming has its own monitoring process for the master (driver) node.  If the master node fails it will be automatically restarted.  If the client node fails, master takes care of restarting them.    But in Apache Storm, the master has to be monitored using monit.

3.7 External Connectors

The framework must support seamless connection to external data generation sources such Twitter feeds, Kafka , RabbitMQ, Flume, RSS Feeds, Hekad, etc.  The frameworks must provide standard inbuilt connectors as well as provision to extend the connectors to connect various streaming data sources.

3.7.1 Social Media Connectors – Twitter / RSS Feeds

3.7.2 Message Queue Connectors -Kafka / RabbitMQ

3.7.3 Network Port Connectors – TCP/UDP Ports

3.7.4 Custom Connectors – Support to develop customized connectors to read from custom applications.

3.8 Programming Language Support

Most of these frameworks supports JVM languages, especially Java & Scala.  Some also supports Python.  The selection of the framework might depend on the language of the choice.

3.9 Reference Data Storage & Access

The real time processing engines, might need to refer some data bases to enhance or aggregate the given data.  So, the framework must provide features to integrate and efficient access to the reference data.  Some frameworks provide ways to internally cache the reference data in memory (Eg. Apache Spark Broadcast Variable).  Apache Samza and Apache Flink supports storing the reference data internally in each cluster node, so that jobs can access them internally without connecting to the data base over the network.

Following are the various methods available in the big data streaming frameworks:

3.9.1 In-memory cache : Allows to store reference data inside cluster nodes, so that it improves the performance by reducing the delay in connecting to external data bases.

3.9.2 Per Client Data Base Storage: Allows to store data in 3rd party database systems like MySQL, SQLite,MongoDB etc inside the streaming cluster.  Also provides API support to connect and retrieve data from those data bases & provides efficient data base connection methodologies.

3.9.3. Remote DBMS connection:  These systems support connecting to the external databases outside the streaming clusters.  This is considered to be less efficient due to higher latency introduced due to network connectivity and bottlenecks introduced due to network communication.

3.10 Latency and throughput

Though hardware configuration plays a major role in latency and throughput, some of the design factors of the frameworks affects the performance.  The factors are : Network IO, efficient use memory,  reduced disk access, in memory cache for reference data.  For example, Apache Kafka Streaming API provides higher throughput and low latency due to reduced network I/O, hence the messaging framework and computing engines are in the same cluster.  Similarly, Apache Spark uses the memory to cache the data, there by reduces the disk access results in low latency and higher throughput.

4. Feature Comparison Table

Following table provides a comparison Apache streaming frameworks against the above discussed features.

StreamFrameworkComparison

The above frameworks supports both statefull and stateless processing modes.

5. Conclusion

This article summarizes the various features of the  streaming framework, which are critical selection criteria for new streaming application.  Every application is unique and has its own specific functional and non-functional requirement, so the right framework is completely depends on the requirement.

6. References

6.1 Apache Spark Streaming – http://spark.apache.org/streaming/

6.2 Apache Storm – http://storm.apache.org/

6.3 Apache Flink – https://flink.apache.org/

6.4 Apache Samza – http://samza.apache.org/

6.5 Apache Kafka Streaming API – http://kafka.apache.org/documentation.html#streams

 

 

GIT Command Reference

Posted on Updated on

Git is a software that allows you to keep track of changes made to a project over time.  Git works by recording the changes you make to a project, storing those changes, then allowing you to reference them as needed.

GIT Project has three parts.

1. Working Directory : The directory where you will be doing all the work.  Creating, editing, deleting and organizing files.

2. Staging Area : The place where you will list changes you make to the working directory.

3. Repository :  A place where GIT permanently stores those changes as different versions of those projects.

GIT WorkFlow:

Git workflow consists of editing files in the working directory, adding files to the staging area, and saving changes to a GIT repository.  Saving changes to GIT repository called commit.

 

I. BASIC GIT Commands

  1. git init – Turns the current working directory into a GIT Project
  2. git status – Prints the current status of git working directory.
  3. git add <filename> – Adds the file into the Staging Area.  [ After adding verify the same with git staus command ]
  4. git add <list of files> Add command also takes list of files.
  5. git diff <filename> – Displays the diff between the file in staging are and current working directory
  6. git commit -m “Commit Comment” – Permenently stores changes from staging area into GIT repository.
  7. git log – Prints the earlier versions of the project which are stored in chronological order.

 

II. Backtracking Changes
In GIT the commit you are currently on is known as the HEAD commit.  In many cases, the most recently made commit is the HEAD commit.SHA – The git log command displays the commit log.  The commit will contain SHA values for each commit.  The SHA is the FIRST 7 Digit of the SHA.

  1. git show HEAD – Displays the HEAD commit
  2. git reset HEAD <filename> – Unstages file changes in the Staging area.
  3. git checkout HEAD <filename> – Discards the changes in the Working Directory.
  4. git reset <SHA> –  It will reset back to the level of commit.

 

III. GIT BRANCHING
GIT allows us to create branches to experiment with versions of a project.  Imagine you want to develop a new API on master branch, until you ready to merge that API in master branch, it will not available.  So, in this scenario we create a branch and develop our new API and merge into master.

  1. git branch – Shows the current branches and current active branch you are in.
  2. git branch <new branch name > – Create a new branch
  3. git checkout <branchname> – Switches to the branch
  4. git merge <branchname> – This command is issued from master to merge branch into the master.
  5. git branch -d <branchname> – Delete the branch.

 

IV. GIT COLLOBORATION
Git offers a suit of colloboration tools to working with other’s project.

  1. git clone <remote_location> <clone_name> – Creates a new replica of git repository from remote repository
  2. git fetch – Update the clone.  This will only update the existing files.
  3. git merge origin/master – Merge the local master with Origin Master.
  4. git push origin <branch> – Push your work to the origin.

 

V. Example Workflow to add a files

  1. git clone <remote_location>
    1.     Eg. git clone http://www.abc.com/abc/bcd
  2. git add <new_files>
    1.     git add abc.py bcd.py
  3. git status
  4. git commit -m”Commiting abc.py”
  5. git push origin master

 


VI. Reference

  1. https://confluence.atlassian.com/bitbucketserver/basic-git-commands-776639767.html

Python Collections : High Performing Containers For Complex Problems

Posted on Updated on

1.Introduction

Python is known for its powerful general purpose built-in data types like list, dict, tuple and set.  But Python also has collection objects like Java and C++.  These objects are developed on top of the general built-in containers with addtional functionalities which can be used in special scenarios.

The objective of this article is to introduce python collection objects and explain them with apropriate code snippets.  The collections library contains the collections objects, they are namedtuples (v2.6), deque (v2.4), ChainMap(v3.3), Counter(v2.7 ), OrderedDict(v2.7), defaultdict(v2.5) .  Python 3.x also has userDict, userList, userString to create own custom container types (not in the scope of this article), which deserves a separate article.

NOTE: Python 2.x user might aware of various releases they are objects got introduced.  All these objects are available in Python 3.x from 3.1 onwards, except that ChainMap which  got introduced in v3.3.  All the code snippets in the articles are executed in Python 3.5 environment.

2. Namedtuple

As the name suggests, namedtuple is a tuple with name.  In standard tuple, we access the elements using the index, whereas namedtuple allows user to define name for elements.  This is very handy especially processing csv (comma separated value) files and working with complex and large dataset, where the code becomes messy with the use of indices (not so pythonic).

2.1 Example 1

Namedtuples are available in collections library in python. We have to import collections library before using any of container object from this library.

>>>from collections import namedtuple
>>>saleRecord = namedtuple('saleRecord','shopId saleDate salesAmout totalCustomers')
>>>
>>>
>>>#Assign values to a named tuple 
>>>shop11=saleRecord(11,'2015-01-01',2300,150) 
>>>shop12=saleRecord(shopId=22,saleDate="2015-01-01",saleAmout=1512,totalCustomers=125)

In the above code snippet, in the first line we import namedtuple from the collections library. In the second line we create a namedtuple called “saleRecord”, which has shopId, saleDate, salesAmount and totalCustomers as fields. Note that namedtuple() takes two string arguments, first argument is the name of tuple and second argument is the list of fields names seperated by space or comma. In the above example space is used as delimeter.
We have also created two tuples here. They are shop11 and shop12.  For shop11, the values are assigned to fields based on the order of the fields and shop12, the values are assigned using the names.

2.2 Example 2

>>>#Reading as a namedtuple
>>>print("Shop Id =",shop12.shopId)
12
>>>print("Sale Date=",shop12.saleDate)
2015-01-01
>>>print("Sales Amount =",shop12.salesAmount)
1512
>>>print("Total Customers =",shop12.totalCustomers)
125

The above code is pretty much clear that tuple is accessed using the names. It is also possible to access them using indexes of the tuples which is the usual way.

2.3 Interesting Methods and Members

2.3.1 _make

The _make method is used to convert the given iteratable item (list, tuple,dictionary) into a named tuple.

>>>#Convert a list into a namedtuple
>>>aList = [101,"2015-01-02",1250,199]
>>>shop101 = saleRecord._make(aList)
>>>print(shop101)
saleRecord(shopId=101, saleDate='2015-01-02', salesAmount=1250, totalCustomers=199)

>>>#Convert a tuple into a namedtuple
>>>aTup =(108,"2015-02-28",1990,189)
>>>shop108=saleRecord._make(aTup)
>>>print(shop108)
saleRecord(shopId=108, saleDate='2015-02-28', salesAmount=1990, totalCustomers=189)
>>>

2.3.2 _fields

The _fields is a tuple, which contains the names of the tuple.

>>>print(sho108._fields)
>>>('shopId', 'saleDate', 'salesAmount', 'totalCustomers')

2.4 CSV File Processing

As we discussed namedtuple will be very handy while processing a csv data file, where we can access the data using names instead of indexes, which make the code more meaningful and efficient.

from csv import reader
from collections import namedtuple

saleRecord = namedtuple('saleRecord','shopId saleDate totalSales totalCustomers')
fileHandle = open("salesRecord.csv","r")
csvFieldsList=csv.reader(fileHandle)
for fieldsList in csvFieldsList:
    shopRec = saleRecord._make(fieldsList)
    overAllSales += shopRec.totalSales;

print("Total Sales of The Retail Chain =",overAllSales)

In the above code snippet, we have the files salesRecord.csv which contains sales records of shops of a particular retain chain. It contains the values for the fields shopId,saleDate,totalSales,totalCustomers. The fields are delimited by comma and the records are delimited by new line.
The csv.reader() read the file and provides a iterator. The iterator, “csvFieldsList” provides list of fields for every single row of the csv file. As we know the _make() converts the list into namedtuple and the rest of the code is self explanatory.

 

3.Counter

Counter is used for rapid tallies.  It is a dictionary, where the elements are stored as keys and their counts are stored as values.

3.1 Creating Counters

The Counter() class takes an iteratable object as an argument and computes the count for each element in the object and present as a key value pair.

>>>from collections import Counter
>>>listOfInts=[1,2,3,4,1,2,3,1,2,1]
>>>cnt=Counter(listOfInts)
>>>print(cnt)
Counter({1: 4, 2: 3, 3: 2, 4: 1})

In the above code snippet, listOfInts is a list which contains numbers. It is passed to Counter() and we got cnt, which is a container object. The cnt is a dictionary, which contains the unique numbers present in the given list as keys, and their respect counts as the value.

3.2 Accessing Counters

Counter is a subclass of dictionary.  So it can be accessed the same as dictionary.   The “cnt” can be handled as a regular dictionary object.

>>> cnt.items()
dict_items([(1, 4), (2, 3), (3, 2), (4, 1)])
>>> cnt.keys()
dict_keys([1, 2, 3, 4])
>>> cnt.values()
dict_values([4, 3, 2, 1])

3.3 Interesting Methods & Usecases

3.3.1 most_common

The most_common(n) of Counter class, provides most commonly occured keys. The n is used as a rank, for example, n = 2 will provide top two keys.

>>>name = "Saravanan Subramanian"
>>>letterCnt=Counter(name)
>>>letterCnt.most_common(1)
[('a', 7)]
>>>letterCnt.most_common(2)
[('a', 7), ('n', 4)]
>>>letterCnt.most_common(3)
[('a', 7), ('n', 4), ('r', 2)]

In the above code, we could see that the string is parsed as independent characters as keys and their respective count is stored as values. So, the letterCnt.most_common(1) provides the top letter which has highest occurances.

3.3.2 Operations on Counter

The Counter() subclass is also called as Multiset. It supports addition, substraction, unition and intersection operations on the Counter class.

>>> a = Counter(x=1,y=2,z=3)
>>> b = Counter(x=2,y=3,z=4)
>>> a+b
Counter({'z': 7, 'y': 5, 'x': 3})
>>> a-b       #This will result in negative values & will be omitted
Counter()    
>>> b-a
Counter({'y': 1, 'x': 1, 'z': 1})
>>> a & b    #Chooses the minimum values from their respective pair
Counter({'z': 3, 'y': 2, 'x': 1})
>>> a | b   #Chooses the maximum values from their respective pair
Counter({'z': 4, 'y': 3, 'x': 2})

4. Default Dictionary

The defaultdict() is available part of collections library. It allows the user to specify a function to be called when key is not present in the dictionary.

In a standard dictionary, accesing an element where the key is not present, will raise “Key Error”. So, this is a problem when working working with collections (list, set, etc), especially while creating them.

So, when a dictionary is queried for a key, which is not exists, the function passed as an argument to the named argument “default_dictionary” of default_dict() will called to set a value for given “key” into dictionary.

4.1 Creating Default Dictionary

The defaultdict() is available part of collections library.  The default dict takes a function without argument which returns value as an argument.

4.1.1 Example 1

>>> 
>>> booksIndex = defaultdict(lambda:'Not Available')
>>> booksIndex['a']='Arts'
>>> booksIndex['b']='Biography'
>>> booksIndex['c']='Computer'
>>> print(booksIndex)
defaultdict(<function  at 0x030EB3D8>, {'c': 'Computer', 'b': 'Biography', 'a': 'Arts'})
>>> booksIndex['z']
'Not Available'
>>> print(booksIndex)
defaultdict(<function  at 0x030EB3D8>, {'c': 'Computer', 'b': 'Biography', 'z': 'Not Available', 'a': 'Arts'})
>>> 

In the above example, the booksIndex is a defaultdict, where it set ‘Not Available” as a value if any non-existant key is accessed. We have added values for keys a, b & c into the defaultdict. The print(booksIndex) shows that the defaultdict contains values only for these keys. While trying to access the value for key ‘z’, which we have not set, it returned value as ‘Not Available‘ and updated the dictionary.

4.1.2 Example 2

>>> titleIndex = [('a','Arts'),('b','Biography'),('c','Computer'),('a','Army'),('c','Chemistry'),('d','Dogs')]
>>> rackIndices = defaultdict(list)
>>> for id,title in titleIndex:
	rackIndices[id].append(title)	
>>> rackIndices.items()
dict_items([('d', ['Dogs']), ('b', ['Biography']), ('a', ['Arts', 'Army']), ('c', ['Computer', 'Chemistry'])])
>>> 

In the above example, titleIndex contains a list of tuples. We want to aggregate this list of tuples to identify titles for each alphabets. So, we can have a dictionary where key is the alphabet and value is the list of titles. Here we used a defaultdict with “list” as a function to be called for missing elements. So for each new elements list will be called, and it will create an empty list object. The consecutive append() methods on the list will add elements to the list.

5. Ordered Dictionary

The ordered dictionary maintains the order of elements addition into the dictionary, where the standard dictionary will not maintain the order of inclusion.

5.1 Ordered Dictionary Creation

Ordered Dictionary is created using OrderedDict() from collections library. It an subsclass of regular dictionary, so it inherits all other methods and behaviours of regular dictionary.

>>> from collections import OrderedDict
>>> dOrder=OrderedDict()
>>> dOrder['a']='Alpha'
>>> dOrder['b']='Bravo'
>>> dOrder['c']='Charlie'
>>> dOrder['d']='Delta'
>>> dOrder['e']='Echo'
>>> dOrder
>>> OrderedDict([('a', 'Alpha'), ('b', 'Bravo'), ('c', 'Charlie'), ('d', 'Delta'), ('e', 'Echo')])
>>> >>> dOrder.keys()
odict_keys(['a', 'b', 'c', 'd', 'e'])
>>> dOrder.values()
odict_values(['Alpha', 'Bravo', 'Charlie', 'Delta', 'Echo'])
>>> dOrder.items()
odict_items([('a', 'Alpha'), ('b', 'Bravo'), ('c', 'Charlie'), ('d', 'Delta'), ('e', 'Echo')])
>>> 

5.2 Creating from other iteratable items

OrderedDict can also be created by passing an dictionary or a list of key, value pair tuples.

>>> from collections import OrderedDict
>>> listKeyVals = [(1,"One"),(2,"Two"),(3,"Three"),(4,"Four"),(5,"Five")]
>>> x = OrderedDict(listKeyVals)
>>> x
OrderedDict([(1, 'One'), (2, 'Two'), (3, 'Three'), (4, 'Four'), (5, 'Five')])
>>> 

5.3 Sort and Store

One of the interesting use case for OrderedDict is Rank problem. For example, consider the problem a dictionary contains students names and their marks, now we have to find out the best student and rank them according to their marks. So, OrderedDict is the right choice here. Since OrderedDict will remember the order or addition and sorted() will sort a dictionary we can combine both to created a rank list based on the student marks. Please check the example below:

>>> studentMarks={}
>>> studentMarks["Saravanan"]=100
>>> studentMarks["Subhash"]=99
>>> studentMarks["Raju"]=78
>>> studentMarks["Arun"]=85
>>> studentMarks["Hasan"]=67
>>> studentMarks
{'Arun': 85, 'Subhash': 99, 'Raju': 78, 'Hasan': 67, 'Saravanan': 100}
>>> sorted(studentMarks.items(),key=lambda t:t[0])
[('Arun', 85), ('Hasan', 67), ('Raju', 78), ('Saravanan', 100), ('Subhash', 99)]
>>> sorted(studentMarks.items(),key=lambda t:t[1])
[('Hasan', 67), ('Raju', 78), ('Arun', 85), ('Subhash', 99), ('Saravanan', 100)]
>>> sorted(studentMarks.items(), key = lambda t:-t[1])
[('Saravanan', 100), ('Subhash', 99), ('Arun', 85), ('Raju', 78), ('Hasan', 67)]
>>> rankOrder = OrderedDict(sorted(studentMarks.items(), key = lambda t:-t[1]))
>>> rankOrder
OrderedDict([('Saravanan', 100), ('Subhash', 99), ('Arun', 85), ('Raju', 78), ('Hasan', 67)])

In the above example, studentMarks is a dictionary contains the student name as a key and their mark as the value. It got sorted using its value and passed to OrderedDict and got stored in rankOrder. Now rankOrder contains the highest marked student as the first entry, and next highest as the second entry and so on. This ordered is presevered in this dictionary.

6. Deque

Deque means double ended queue and it pronounced as “deck”. It is an extention to the standard list data structure. The standard list allows the user to append or extend elements only at the end. But deque allows the user to operate on both ends, so that the user can implement both stacks and queues.

6.1 Creation & Performing Operations on Deque

The deque() is available in collections library. It takes iteratable entity as an argument and an optional maximum length. If maxlen is set, it ensure that deque length does not exceeds the size of the maxlen.

>>> from collections import deque
>>> aiao = deque([1,2,3,4,5],maxlen=5)
aiao = deque([1,2,3,4,5])
>>> aiao.append(6)
>>> aiao
deque([2, 3, 4, 5, 6], maxlen=5)
>>> aiao.appendleft(1)
>>> aiao
deque([1, 2, 3, 4, 5], maxlen=5)

In the above example, we have created a deque with maxlen 5, once we appended 6th element on the right, it pushed first element on the left.  Similarly, it pushes out the last element on the right when we append element on the left.

6.2 Operations on Right

Operations on the right are common to performing any opertions on the list.  The methods append(), extend() and pop() are operate on the rightside of the deque().

>>> aiao.append(6)
>>> aiao
deque([2, 3, 4, 5, 6], maxlen=5)
>>> aiao.extend([7,8,9])
>>> aiao
deque([5, 6, 7, 8, 9], maxlen=5)
>>> aiao.pop()
9

6.3 Operation on the Left

The special feature of performing operations on the left is supported by set of methods like appendleft(), extendleft(), popleft().

>>> aiao = deque([1,2,3,4,5],maxlen=5)
>>> aiao.appendleft(0)
>>> aiao
deque([0, 1, 2, 3, 4], maxlen=5)
>>> aiao.extendleft([-1,-2,-3])
>>> aiao
deque([-3, -2, -1, 0, 1], maxlen=5)
>>> aiao.popleft()
-3

6.4 Example 2 (without maxlen)

If the maxlen value is not set, the deque does not perform any trimming operations to maintain the size of the deque.

>>> aiao = deque([1,2,3,4,5])
>>> aiao.appendleft(0)
>>> aiao
deque([0, 1, 2, 3, 4, 5])
>>> aiao.extendleft([-1,-2,-3])
>>> aiao
deque([-3, -2, -1, 0, 1, 2, 3, 4, 5])
>>> 

From the above example, the deque aiao continues to grow for the append and extend operations performed on it.

7. ChainMap

ChainMap allows to combine multiple dictionaries into a single dictionary, so that operations can be performed on single logical entity.  The ChainMap() does not create any new dictionary, instead it maintains references to the original dictionaries, all operations are performed only on the referred dictionaries.

7.1 Creating ChainMap

>>> from collections import ChainMap
>>> x = {'a':'Alpha','b':'Beta','c':'Cat'}
>>> y = { 'c': "Charlie", 'd':"Delta", 'e':"Echo"}
>>> z = ChainMap(x,y)
>>> z
ChainMap({'c': 'Cat', 'b': 'Beta', 'a': 'Alpha'}, {'d': 'Delta', 'c': 'Charlie', 'e': 'Echo'})
>>> list(z.keys())
['b', 'd', 'c', 'e', 'a']
>>> list(z.values())
['Beta', 'Delta', 'Cat', 'Echo', 'Alpha']
>>> list(z.items())
[('b', 'Beta'), ('d', 'Delta'), ('c', 'Cat'), ('e', 'Echo'), ('a', 'Alpha')]

We have created ChainMap z from other dictionaries x & y. The ChainMap z is reference to the dictionaries x and y. ChainMap will not maintain duplicate keys, it returns presents value ‘Cat’ for key ‘c’. So, basically it skips the second occurance of the same key.

>>> x
{'c': 'Cat', 'b': 'Beta', 'a': 'Alpha'}
>>> y
{'d': 'Delta', 'c': 'Charlie', 'e': 'Echo'}
>>> x.pop('c')
'Cat'
>>> x
{'b': 'Beta', 'a': 'Alpha'}
>>> list(z.keys())
['d', 'c', 'b', 'e', 'a']
>>> list(z.values())
['Delta', 'Charlie', 'Beta', 'Echo', 'Alpha']
>>> list(z.items())
[('d', 'Delta'), ('c', 'Charlie'), ('b', 'Beta'), ('e', 'Echo'), ('a', 'Alpha')]
>>> 

In the above code, we have removed the key ‘c’ from dict x. Now the ChainMap points the value for key ‘c’ to “Charlie”, which is present in y.

8. Summary

We have seen various python collection data types and understand them with example and use cases. The official python documentation can be referred for further reading.

9. References

[1] – Python Wiki – https://docs.python.org/3.5/library/collections.html

Functional Programming in Python

Posted on Updated on

Idea of this blog is to understand the functional programming concepts using python.

1. Programming Paradigms

There are three programming paradigms:  Imperative Programming, Functional Programming and Logic Programming.  Most of the programming languages support only imperative style.  Imperative style is having direct relationship with machine language.  The features of imperative styles  like assignment operator, conditional execution and loops are directly derived from machine languages.  The procedural and object oriented programming languages such as C, C++, Java are all imperative programming languages.

Logic programming is completely a different style, it will not contain the solution to the problem instead written in terms of facts and rules.  Prolog, ASP & DataLog are some of logic programming languages.

2. Functional programming  

Wikipedia definition says  ” In computer science, functional programming is a programming paradigm—a style of building the structure and elements of computer programs—that treats computation as the evaluation of mathematical functions and avoids changing-state and mutable data”.

Following are the characterstics of the functional programmings :

Immutability :

Functional languages must support immutable objects by default, mutable object must be declared explicitly and consciously.  In contrast, imperative style languages supports mutable objects by default also immutable objects must be declared explicitly from a different library.

Function are first class citizens:

Functions are first class citizens and it is handled like any other object.   It means functions can be stored as an object, functions can be passed as an argument to other functions, function can return function.

Lexical Scoping:

In functional programming, the scope of the function depends on the location of its definition.  If a function is defined inside a function, then its scope is only within the outer function.  It cannot be referred outside the outer function.

3. Type of Functions

3.1 Higher Order Functions: 

A function can take other function as a argument and may return a function.

def FindFormula(shape):
    if (shape == "Square"):
        return SquareArea
    if (shape == "Circle"):
        return CircleArea

def SquareArea(a):
    return (a * a)

def CircleArea(r):
    return (22/7*r * r)

def findShape(sides):
    if (sides == 4):
        return "Square"
    if (sides == 0 ):
        return "Circle"

if __name__ == "__main__":
    
    size = 5
    area = FindFormula(findShape(sides=4))(size)
    print("Area = ",area)

3.2 Anonymous Functions

A function without a name is called an anonymous function.  Generally these functions are defined inside other functions and called immediately.

area = lambda x : 22/7*x*x
print(area(10))

Anonymous functions are created with “lambda” keyword.  We can use this syntax, where we need a function.

3.3 Nested Functions

Functions can be defined within the scope of another function. This type of functions defined inside a other function.  The inner function is only in scope inside the outer function.   It is useful when the inner function is being returned or when it is being passed into another function.

def oFunction(x,y):
    def SqArea(x,y):
    return (x*y)
return SqArea(x,y)
sum = oFunction(10,20)

In the above example, the function SqArea is strictly inside the scope of oFunction().

 3.4 Currying

Simple definition by Cay S Hartsmann in his book “Scala for the impatient”: Currying is the process of turning a function that takes two arguments into a function that takes one argument. That functions returns a function that consumes the second argument.

The generalized definition for currying: A function that takes multiple arguments and turning into a chain of functions each taking one argument and returning the next function, until the last returns the result.

def curried_pow(x):
     def h(y):
         return pow(x,y)
     return h
print curried_pow(2)(3)

3.5 Closures

A closure is a persistent scope which holds on to local variables even after the code execution has moved out of that block. The inner function that remembers the state of the outer function, even after the outer function has completed execution.

Generally this behavior is not possible in imperative programming styles, because the function is executed in a separate stack frame.  Once the function completes its execution the stack frame is freed up.  But in Functional Programming, as long as the inner function remembers the state of the outer function, the stack frame is not freed up.

def sumA(a):
    def sumB(b):
        return(a+b)
    return(sumB)
x = sumA(10)
y = x(20)
print (y)

Benefits of Closures:
It avoids the use of global variables and provides data hiding. In scenarios, where the developer does not want to go for object oriented program, closures can be handy to implement abstraction.

3.6 Tail Recursion

A recursive function is called as Tail Recursive when the last statement in the function makes the recursive calls only.  It is much more efficient than the recursive function because, since the recursive call is the last statement, there is nothing to be saved in the stack for the current function call. Hence the tail recursion is very light on the memory utilization.

Regular Recursive Function for Factorial in Python:

def fact(n):
    if (n == 0):
        return 1
    else:
        return n * fact(n-1)

The same can converted to Tail Recursion as below, by having additional argument:

def fact(n):
         tailFact(n,1)

def tailFact(n,a):
    if (n == 0):
        return(a)
    else:
        return tailFact(n-1,n*a)

4 Benefits of Functional Programming

Clarity & Complexity : Functional programming supports writing modular, structured and hierarchical code.  The use of anonymous functions, local functions makes it easier to organize the code hierarchically.  The currying and closures are reduces the complexity of the program.

Concurrency : Programming for multi-core architecture is a challenge, where single piece of code can be executed by many threads parallel.  So, the issues of code re-entry problem and shared memory areas will arise. In imperative style programming we must use synchronization techniques to avoid these problems, which will lead to performance impact.  The philosophies of functional programming style fits to suite the need of multi-core programming by enforcing immutability.

Memory Efficient: Functional programming style is memory efficient.  The anonymous functions, nested functions and tail-recursion styles are very light on the memory.  Because of the lexical scoping, once the function is out of scope they will be removed from memory and tail-recursion is very efficient by not holding the stack frames.

5 Summary

We have briefly discussed the  functional programming paradigm & its benefits.   The functional programming style is most suitable for developing algorithms, analytic, data mining and machine learning algorithm.   Most of the modern programming languages are supporting functional styles, also Java is trying to catch up with this style in the latest version (Java 8).

Reference

[1] Haskell Wiki – https://wiki.haskell.org/Functional_programming

[2] Currying – https://mtomassoli.wordpress.com/2012/03/18/currying-in-python/

Why Scala ?

Posted on Updated on

SCALA

  • Scala – means Scalable Language ( pronounced as scah-lah)
  • Scala extends  the Java language, in-fact internally uses many of Java language libraries.
  • Scala is recommended for algorithm development , big data processing & multi-threaded applications in a multi-core environment.
  • The Scala compiler, “scalac” generates java byte code, which can run in JVM
  • Scala is a statically typed language, suites large projects.
  • Scala improves the productivity of the developers after the initial learning curve.

Language Features

1. JVM Language

Scala is an JVM language.  Scala generates Java byte code, which can run on JVM.  Scala supports use of existing Java libraries in scala code.  Java libraries & Scala libraries are inter-operable.

2. Functional Programming Language along with Object Oriented Programming

Scala supports both Functional Style along with OOPs, unlike Java which is strictly OOPs or other functional programming languages which are strictly functional (Haskell, Clojure).

  • Immutable Objects by default : Scala creates immutable objects only.  If the programmer wants mutable objects, they must be created consciously  from a different library.
  • Higher Order Functions :  In Scala function is an object.  It supports anonymous functions, tail-recursion, closure etc.

3. Concurrency Support in Multi-Core Environment

  • Actor – Similar to Java Threads, which has advanced thread management, synchronous/asynchronous communication mechanism across threads.  This communication mechanism avoids the need for thread synchronization.
  • NO support for STATIC class/variables.  If similar behavior is needed programmer has to create a companion object for the class.  This feature avoid the need to protect the critical region which act on the global/static memory.
  • Threads (Actors) can be created and run for-ever and execute based on the messages.
  • Private[this] – feature to make a private member, perfectly private.  A private member of an object, can be accessed by other object of the same class.  This can restricted by Private[this] feature.

4. Removes boiler plate coding:

  • Object creation – simplified
  • Collection library – minimal set of methods to handle most scenarios

5. More Features & Less typing

  • Scala creates default code for commonly used behaviors
    • Automatically creates constructors for class
    • Class can have arguments, which can act as argument for default constructor
    • Automatically Getter & Setter Functions for class members
  • Simple File Handling & IO
    • No need to worry about [File/Input/Output/Buffer] [Stream / Reader / Writer] (Eg. Input Stream Reader => Buffer Reader => ReadInt )
    • Simple functions for input and output statements ( Because Scala supports functions)
  • More language features
    • Easy to create Singleton – Object without a Class definition
    • No primitive data type – All data types of objects : Byte, Char, Int, Long, Double, Float and String.
  • Less typing – Each key stroke is valuable
    • No semicolon
    • No return statement for functions (Every Scala Statement is an Expression which returns a value)

Summary

After the initial learning curve, scala improves the productivity of the engineers.  Since scala introduces some new syntax, new symbols into the programming the initial learning curve is slightly bigger than learning other modern languages.  At the same time, if the engineer is experienced Java programmer, they will greatly appreciate the features of the scala language.

RESTful Web Services with Python Flask

Posted on Updated on

The idea of this post is to describe how to develop a RESTful Web Services in Python.

RESTful Web Service is an architectural style, where the data or the structural components of a system is described  in the form of URI ( Uniform Resource Identifier) and the behaviors are described in-terms of methods.  The resources can be manipulated using CRUD (Create, Read, Update and Delete) operations.   The communication protocol for REST is HTTP, since it suits the architecture requirement of being a stateless communication across the Client and Server.

There are many frameworks available in Python for web development, but Django  (pronounced as yango ) and Flask stands out of the crowd being a full stack frameworks.  I prefer Flask framework, since it is very small and easy to learn for beginners, where as Django is too big for beginners.

1. The Plan

In this exercise we will create a in memory JSON DB to store and manipulate a simple employee database and develop RESTful APIs to perform CRUD operations using GET, POST, PUT & DELETE methods.

We will develop the below APIs

i) GET  /empdb/employee/                      – Retrieve all the employees from the DB

ii) GET /empdb/employee/       – Retrieve the details of given employee Id

ii) POST /empdb/employee/                       – Create a record in the employee DB, where as the employee details are sent in the request as a JSON object

III) PUT /empdb/employee/                    – Update the employee DB, with the given details of employee in the data part as a JSON object

Iv) DELETE /empdb/employee/ – Delete the employee from the DB for the employee Id.

2. Installation of flask

To install flask framework, please refer the official website [1]  . If you have pip installed in your Python environment, please follow this step.

$ pip install Flask

If you don’t have pip, please download the flask from http://pypi.python.org/packages/source/F/Flask/Flask-0.10.1.tar.gz and execute the setup.py .

3. Hello World – Web Server

First we create a web server, create a dictionary to hold a JSON objects for couple of employee records and then we add RESTful APIs for each supported operations.

Please look at the below program, which create a web server.  Save the below program into hello.py and execute it.

from flask import Flask
app = Flask(__name__)

@app.route("/")
def hello():
    return "Hello World!"

if __name__ == "__main__":
    app.run()

The below line from the code creates an app object from Flask.

app = Flask(__name__)

app.run() starts the web server and ready to handle request.  But at this moment it can handle only one request.  It is defined in the below line of code.

@app.route("/")
def hello():
    return "Hello World !"

Execute the above program & you will see that you web server is ready to service you.

* Running on http://localhost:5000/

Now you can open your web browser and check your web server.  The server is available in the URL http://localhost:5000/.  If you are familier with cUrl execute the below to check the status.

$ curl -i http://localhost:5000/

4. Develop the RESTful Services

To develop the restful services for the planned objective, lets create a in memory database in python using the dictionary data type.  Please find the code snippet below:  We can continue to use the hello.py and type the below code, just after the Flask app creation statement app = Flask(__name__).  You can also refer the below section 5 for the complete code.

empDB=[
 {
 'id':'101',
 'name':'Saravanan S',
 'title':'Technical Leader'
 },
 {
 'id':'201',
 'name':'Rajkumar P',
 'title':'Sr Software Engineer'
 }
 ]

4.1 GET

In the previous section, we have created two employees in the dictionary.  Now lets write a code to retrieve them using web services.  As per our plan, we need two implementations one is to retrieve all the employees and another one to retrieve the specific employee with the given id.

4.1.1 GET All
@app.route('/empdb/employee',methods=['GET'])
def getAllEmp():
    return jsonify({'emps':empDB})

In the above code snippet, we have created a URI named ‘/empdb/employee’ and also we defined the method as “GET”.  To service the GET call for the URI,  Flask will call the function getAllEmp().  It will inturn simply calls the “jsonify” method with employeeDB as the argument.  The “jsonify” is a flask method, will set the data with the given JSON object which is passed as a Python dictionary and set the headers appropriately, in  this case “Content-type: application/json”.

We can check the above Web Service with cUrl as below:

cUrl> curl -i http://localhost:5000/empdb/employee
Response for CURL
Response for CURL

4.1.2 Get Specific

Now we develop the rest service to get a employee with a given id.

@app.route('/empdb/employee/<empId>',methods=['GET'])
def getEmp(empId):
    usr = [ emp for emp in empDB if (emp['id'] == empId) ] 
    return jsonify({'emp':usr})

The above code will find the employee object with the given id and send the JSON object in the data.  Here I have used the list comprehension technique in Python, if you don’t understand you can simply write in a imperative way of processing the entire dictionary using a for loop.

CURL > curl -i http://localhost:5000/empdb/employee/101

The response would be :

Get An Employee

4.2 PUT

PUT method is used to update the existing resource.  The below code gets the employee id from the URL and finds the respective object.  It checks the request.json from the request for the new data & then it over writes the existing.

NOTE : the request.json will contain the JSON object set in the client request.

@app.route('/empdb/employee/<empId>',methods=['PUT'])
def updateEmp(empId): 
    em = [ emp for emp in empDB if (emp['id'] == empId) ] 
    if 'name' in request.json : 
        em[0]['name'] = request.json['name'] 
    if 'title' in request.json:
        em[0]['title'] = request.json['title'] 
 return jsonify({'emp':em[0]})

We can also use a Postman client or cUrl to update an existing employee.  The data must contain the JSON object either with a name or title.

The service can be invoked as follows in cUrl.  Here we update the “title” for employee id 201 with “Technical Leader”. The request is responded with employee json object with updated values.  It also updates the employee DB.

03-Post

4.3 POST

POST method is used to create a new employee inside the data base.  The code snippet is below:

@app.route('/empdb/employee',methods=['POST'])
def createEmp(): 
    dat = {
    'id':request.json['id'],
    'name':request.json['name'],
    'title':request.json['title']
    }
 empDB.append(dat)
 return jsonify(dat)

The above code, simply reads the request.json for the expected values, and stores them in the local dictionary object and appends it to the employee DB dictionary.  This also returns the newly added employee object as the response.

4.4 DELETE

Lets write a code to delete a given employee id.

@app.route('/empdb/employee/<empId>',methods=['DELETE'])
def deleteEmp(empId): 
    em = [ emp for emp in empDB if (emp['id'] == empId) ] 
    if len(em) == 0:
    abort(404) 
    
    empDB.remove(em[0])
    return jsonify({'response':'Success'})

the above service can be used as follows:

05-delete

5 Complete Code

from flask import Flask
from flask import jsonify
from flask import request

app = Flask(__name__)

empDB=[
 {
 'id':'101',
 'name':'Saravanan S',
 'title':'Technical Leader'
 },
 {
 'id':'201',
 'name':'Rajkumar P',
 'title':'Sr Software Engineer'
 }
 ]

@app.route('/empdb/employee',methods=['GET'])
def getAllEmp():
    return jsonify({'emps':empDB})

@app.route('/empdb/employee/<empId>',methods=['GET'])
def getEmp(empId):
    usr = [ emp for emp in empDB if (emp['id'] == empId) ] 
    return jsonify({'emp':usr})


@app.route('/empdb/employee/<empId>',methods=['PUT'])
def updateEmp(empId):

    em = [ emp for emp in empDB if (emp['id'] == empId) ]

    if 'name' in request.json : 
        em[0]['name'] = request.json['name']
 
    if 'title' in request.json:
        em[0]['title'] = request.json['title']
 
    return jsonify({'emp':em[0]})
 

@app.route('/empdb/employee',methods=['POST'])
def createEmp():

    dat = {
    'id':request.json['id'],
    'name':request.json['name'],
    'title':request.json['title']
    }
    empDB.append(dat)
    return jsonify(dat)

@app.route('/empdb/employee/<empId>',methods=['DELETE'])
def deleteEmp(empId):
    em = [ emp for emp in empDB if (emp['id'] == empId) ]
 
    if len(em) == 0:
       abort(404)
 
    empDB.remove(em[0])
    return jsonify({'response':'Success'})

if __name__ == '__main__':
 app.run()

6 CONCLUSION

This is a very basic web services we have developed.  I hope this helps to understand basics of RESTful Web Services development.  We can make this implementation clean by proper error handling and authentication.   I suggest to everyone to visit the official documentation of Flask for further learning.

Reference

[1] Flask – http://flask.pocoo.org/

HTTP / RESTful API Calls with Python Requests Library

Posted on Updated on

The objective of this post is to give a brief introduction to HTTP and RESTful APIs.  Also develop an RESTful client in Python using the “requests” library and “json” library.  I intentionally did not use the urllib2 or any other standard Python library,  since I want to explain the power of the “requests” library, which is a simple and straight forward library for developing RESTful Clients.

HTTP & RESTful APIs

HTTP is a request / response protocol and is similar to client-server model.  In the internet world, generally the web browser sends the HTTP request and the web server responds with HTTP response.  Also it is not necessary that the client is always a browser.  The client can be any application which can send a HTTP request.

We have used so many application level communication protocols.  Starting from RPC (Remote Procedure Call), Java RMI (Remote Method Invocation), XML/RPC, SOAP/HTTP.  In this lineage RESTful API is the current application level client-server protocol.

RESTful API is an application level protocol.  It is heavily used in internet (WWW)  and distributed systems.  It is recommended by Services Oriented Architecture (SOA) to communicate between loosely coupled distributed components. The RESTful API is a form of HTTP protocol is the de facto standard for Cloud communications.

The two properties of RESTful which makes suitable for modern internet and cloud communication is stateless and cache-less.  The protocol does not enforce any state-machine, it means there is no order of protocol messages enforced.  Also the protocol will not remember any information across requests or responses.  Each and every request is unique and it has no relation with previous or next request which may come.  To understand more on HTTP protocol look at the references below.  Hence forth we will move along with Python Requests library to learn and develop RESTful API.

Request Library

The Requests python library is simple and straight forward library for developing RESTful Clients.  Python has a built in library called urllib2, it is bit complex and old style when compared to Requests. After writing couple of programs using the urllib2, I am completely convinced by the below statement issued by the developers of Requests.   Also refer the Reference[4] for comparing the code segments written using urllib2 and requests library.

Python’s standard urllib2 module provides most of the HTTP capabilities you need, but the API is thoroughly broken. It was built for a different time — and a different web. It requires an enormous amount of work (even method overrides) to perform the simplest of tasks.

Please refer the URL http://docs.python-requests.org/en/latest/user/install/#install to install the requests library before proceeding.

 The Structure of HTTP / RESTful API

Following are points to remember while developing RESTful API:

  1. URL ( Universal Resource Locator )
  2. Message Type
  3. Headers
  4. Parameters
  5. Payload
  6. Authentication

1. URL

The URL is the core of RESTful API.  Generally the URL refers a web page, but it can also refer a service or a resource.

For example : http://graph.facebook.com/v2.3/{photo-id}

The above URL is a resource which holds the photo with id photo-id.  As per the above syntax the value for the photo-id must be replaced with {photo-id}.

Python code snippet to store a URL in a Python object:

>>> url = 'http://graph.facebook.com/v2.3/123435'

2. Message Types

HTTP supports GET/POST/PUT/DELETE message types.  There are few more types as well.  Please take a look at the reference[1] to understand them in detail.

GET – to retrieve resource.  Eg. GET http://graph.facebook.com/v2.3/1234345 will retrieve the photograph stored in that location.

>>> import requests
>>> ret = requests.get(url)
>>> ret.staus_code
200

POST – to update a resource .  POST http://graph.facebook.com/v2.3/123435 will update the existing photo with the new photograph supplied in the message payload.  POST will also create resource, if the resource is not available.

>>> import requests
>>> ret = requests.post(url)
>>> ret.status_code
200

PUT – to create a resource.  PUT http://graph.facebook.com/v2.3/123435 will create a resource by uploading the photograph sent on the message payload.

>>> import requests
>>> ret = requests.put(url)
>>> ret.status_code
201

DELETE – to delete a resource – DELETE http://graph.facebook.com/v2.3/123435 will delete the photograph present in that location.

>>> import requests
>>> ret = requests.delete(url)
>>> ret.status_code
200

 3. Headers

The HTTP header generally contains information used to process the request and responses.  The headers are colon separated key value pairs. For example “Accept: text/plain”.  The http request & response may be have multiple headers.  Since it is a key value pair, we can use Python’s dictionary data type to store these values.

Single Header & Multiple headers:

>>> head = {"Content-type": "application/json"}
>>> head= {"Accept":"applicaiton/json",
        "Content-type": "application/json"}

Make the API call with the above header:

>>> ret = requests.get(url,headers=head)
>>> ret.status_code
200

In the above statement, “headers” is the name of argument.  So we have used the Python feature of passing named arguments to a function.

4 Parameters

Sometimes we may want to pass values in the URL parameters.  For example, the URL http://www.abc.com/abc.php?name=Saravanan&designation=Technical Leader .  This URL expects the user to send the value for the keyword “name” and  “designation”.    The below code snippet helps to you accomplish this tasks.  The “params” argument is used to set the value for parameters.

>>> parameters = {'name':'Saravanan',
          'designation':'Technical Leader'}
>>> head = {'Content-Type':'application/json'}
>>> ret = requests.post(url,params=parameters,header=head)
>>> ret.status_code
200

5 Payload

The payload contains the data to be sent on the requests.  In this we will see how to send a JSON object in the payload.

empObj = {'name':'Saravanan', 'title':'Architect','Org':'Cisco Systems'}

As in the previous examples, we cannot send the JSON object which is a dictionary data type in Python.  In the above snippet we created a empObj which is a dictionary data type of Python.  This must be converted into JSON object before send the request.

The json library in Python helps here .

>>> import json
>>> emp = json.dumps(empObj)

The json.dumps converts the dictionary object into a JSON object.

The complete code snippet is below:

>>> import json
>>> import requests
>>>
>>> url='http://graph.facebook.com/v2.3/123123
>>> head = {'Content-type':'application/json',
             'Accept':'application/json'}
>>> payload = {'name':'Saravanan',
               'Designation':'Architect',
               'Orgnization':'Cisco Systems'}

>>> payld = json.dumps(payload)
>>> ret = requests.post(url,header=head,data=payld)
>>> ret.status_code
200

 

6 Authorization

The “requests” library supports various forms of authentication, which includes Basic, Digest Authentication, OAuth and others.  The value for authentication can be passed using “auth” parameter of the requests method.

>>> 
>>> from requests.auth import HTTPBasicAuth
>>> url = 'http://www.hostmachine.com/sem/getInstances'
>>> requests.get(url, auth=HTTPBasicAuth('username','password')
200

The “auth” argument can take any function, so if you want to define your own custom authentication and pass it to “auth“.

Summary

The above code snippet is a sample to explain the simplicity of Python and requests library.  You can take a look at the official website of Requests and learn advanced concepts in RESTful API developments.

 

 

References

[1] HTTP Wiki : http://en.wikipedia.org/wiki/Hypertext_Transfer_Protocol

[2] History of HTTP by W3 Org : http://www.w3.org/Protocols/History.html

[3] Requests – http://docs.python-requests.org/en/latest/

[4] Requests and Urllib2 Comparison : https://gist.github.com/kennethreitz/973705

[5] Installation of Requests library : http://docs.python-requests.org/en/latest/user/install/#install

[6] HTTP Headers – http://en.wikipedia.org/wiki/List_of_HTTP_header_fields