Tag Archives: Python

  • 1

What is a Python Dictionary

Category:Blog Tags : 
Python Dictionary

Python Dictionary

Dictionary is a method in which data is stored in pairs of keys and values. These are also called Associative Arrays in other programming languages.

What is key-value pair?

key is a unique identifier for a given record. Values are data stored in that identifier. For example, Let us say that Muneer is a student, and we want to create a dictionary containing his details. The first key in his record is name and the value for this key is ‘Muneer’. He has a weight of 75 Kg, therefore, the second key in this record is weight, and the value of this key is 75. His height is 6ft, and has age of 35 years. In this record, following are the key value pairs

  +-----------------+
  |   keys   values |
  |-----------------|
  |   name   Muneer |
  | weight       75 |
  | height        6 |
  |    age       35 |
  +-----------------+

How to create a dictionary

A dictionary is created using curly brackets. The first item is always the key followed by a full colon, the second item is the value. Next key-value pair is created using a comma.

In [2]:
student = {'name': 'Muneer', 'weight': 75, 'height': 6, 'age': 35}
In [3]:
student
Out[3]:
{'name': 'Muneer', 'weight': 75, 'height': 6, 'age': 35}
In [13]:
student.get('name')
Out[13]:
'Muneer'
In [14]:
student.get('age')
Out[14]:
35
In [ ]:
 

  • 0

Getting Started with Data Visualization in Python Pandas

Category:Blog Tags : 

DOWNLOAD DATASETS

To download the datasets used in this tutorial, pleas see the following links
1. gapminder.tsv
2. pew.csv
3. billboard.csv
4. ebola.csv
5. tips.csv

TED Talk Dataset Excercises

In [5]:
# Change directory
In [6]:
cd "D:\Dropbox\CLASSES\Data Science for Finance\Python\Lecture 1 - Assignment"
D:\Dropbox\CLASSES\Data Science for Finance\Python\Lecture 1 - Assignment
In [7]:
import pandas as pd
In [8]:
ted = pd.read_csv('ted.csv')

1: Explore the Data attributes

In [11]:
ted.dtypes
Out[11]:
comments               int64
description           object
duration               int64
event                 object
film_date              int64
languages              int64
main_speaker          object
name                  object
num_speaker            int64
published_date         int64
ratings               object
related_talks         object
speaker_occupation    object
tags                  object
title                 object
url                   object
views                  int64
dtype: object
In [12]:
ted.info()
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 2550 entries, 0 to 2549
Data columns (total 17 columns):
comments              2550 non-null int64
description           2550 non-null object
duration              2550 non-null int64
event                 2550 non-null object
film_date             2550 non-null int64
languages             2550 non-null int64
main_speaker          2550 non-null object
name                  2550 non-null object
num_speaker           2550 non-null int64
published_date        2550 non-null int64
ratings               2550 non-null object
related_talks         2550 non-null object
speaker_occupation    2544 non-null object
tags                  2550 non-null object
title                 2550 non-null object
url                   2550 non-null object
views                 2550 non-null int64
dtypes: int64(7), object(10)
memory usage: 338.8+ KB
In [13]:
ted.shape
Out[13]:
(2550, 17)

2. Which talk has the highest comments

In [77]:
ted.sort_values('comments')[['comments', 'duration','main_speaker']].tail()
Out[77]:
comments duration main_speaker
1787 2673 1117 David Chalmers
201 2877 1099 Jill Bolte Taylor
644 3356 1386 Sam Harris
0 4553 1164 Ken Robinson
96 6404 1750 Richard Dawkins

3 Find top 5 talks that have the highest views to comments ratio

In [16]:
ted['view_to_comment'] = ted['views'] / ted['comments']
In [17]:
ted['view_to_comment'].tail()
Out[17]:
2545    26495.882353
2546    69578.333333
2547    37564.700000
2548    13103.406250
2549    48965.125000
Name: view_to_comment, dtype: float64

4 . Create a histogram of comments

In [19]:
import matplotlib.pyplot as plot
ted['comments'].plot(kind = 'hist')
Out[19]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a233e3978>

5. Create histogram of comments where comments are less than 1000

In [35]:
# Get index of those row which have less than 1000 comments 
index = ted['comments']<1000
In [38]:
# Get only the comments column from these filtered rows
com1000 = ted[index]['comments']
In [39]:
# Make a plot of these filtered comments
com1000.plot(kind = 'hist')
Out[39]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a236dc7b8>
In [40]:
# When you expert, you can do the above just in one line
ted[ted['comments']<1000]['comments'].plot(kind='hist')
Out[40]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a2375cac8>
In [44]:
# How many rows were excluded from the above graph
ted[ted['comments'] >=1000].shape
Out[44]:
(32, 18)

6. Do the same as in 5, but using a query method

In [68]:
# Filter the whole dataset where comments are less than 1000
ted1000 = ted.query('comments <1000')
In [69]:
# Get only the comments column from the reduced dataset
comment1000 = ted1000['comments']
In [70]:
# Plot the filtered comments
comment1000.plot(kind = 'hist')
Out[70]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a238fb630>

7. How to add more bins to the histogram

In [71]:
comment1000.plot(kind = 'hist', bins = 20)
Out[71]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a23953278>

8. Make a box plot and identify outliers

In [73]:
comment1000.plot(kind = 'box')
Out[73]:
<matplotlib.axes._subplots.AxesSubplot at 0x14a23a4ba20>

The black dots show outliers

In [ ]:
 

  • 7

Quick setup of Python with Stata 16

Category:Blog,Stata Programs Tags : 


With the announcement of Stata 16, Python commands can be executed directly from the Stata command prompt, do files or ado programs. That would definitely expand the possibilities of doing extraordinary things without leaving the Stata environment. However, this integration exposes Stata to all the problems of Python installations and its packages.

First of all, Python does not come as part of the Stata installation. Stata depends on the already installed version of Python. That would definitely make a Stata-Python code less portable. One solution might be the portable version of Python. Only time can tell what will work best in such situations.

In this short post, I am going to outline a few basic steps to get started with Python from Stata. These steps are mentioned below:


1.What Version of Python to Install

A number of options are available to install Python. Over the past 12 months, I found that the installation of Python using Anaconda is the least problematic one. And with Stata 16, this again came out true. The stand-alone version of Python did not work with Stata. Each time I tried to type python from the Stata command prompt, the error message generated by Stata was:

initialized          no
r(7100);

What I did was to uninstall the other version of Python and kept only the Anaconda installation.


2. Set the Installation path

Stata can search for any available Python installation, including the installation through Anaconda. To search and associate python with Stata, I typed the following from the Stata command prompt:

python search 
set python_exec  D:\Anaconda\python.exe, permanently

The first line of code finds the directory path and the Python executable file. The second line of code sets which Python version to use. Option permanently would save this path for future use as well. And that’s all.


3. Using Python

Once the above steps go without an error, we are ready to use Python. In the Stata command window, we can enter the Python environment by typing python, and the three greater than familiar symbol will appear on the screen

 . python
 --------- python (type end to exit) ------- 
>>>2+2
       4
 >>>end 
-------------------------------------------