“Always and never are two words you should always remember never to use. ”
― Wendell Johnson
Contents
1. Introduction
After covering ways of creating a DataFrame and working with it, we now concentrate on extracting data from the DataFrame. You may also be interested in our tutorials on a related data structure – Series; part 1 and part 2.
Getting Started
Import these libraries: pandas, matplotlib for plotting and numpy.
import pandas as pd import matplotlib.pyplot as plt import numpy as np import random
If you are working with a Jupyter (or iPython) notebook and want to show graphs inline, use this definition.
%matplotlib inline
2. Using Python Array Slice Syntax
The standard python array slice syntax x[apos:bpos:incr] can be used to extract a range of rows from a DataFrame. However, the pandas documentation recommends the use of more efficient row access methods presented below.
2.1. First Few Rows
You can also use the head() method for this operation.
x = pd.read_csv('big-data/Salaries.csv')
print x[:4]
# prints
   yearID teamID lgID   playerID  salary
0    1985    ATL   NL  barkele01  870000
1    1985    ATL   NL  bedrost01  550000
2    1985    ATL   NL  benedbr01  545000
3    1985    ATL   NL   campri01  633333
2.2. Last Few Rows
This can also be done using the tail() method.
print x[-4:]
# prints
       yearID teamID lgID   playerID    salary
26424    2016    WSN   NL  taylomi02    524000
26425    2016    WSN   NL  treinbl01    524900
26426    2016    WSN   NL  werthja01  21733615
26427    2016    WSN   NL  zimmery01  14000000
2.3. Range of Rows
print x[2:4] # prints yearID teamID lgID playerID salary 2 1985 ATL NL benedbr01 545000 3 1985 ATL NL campri01 633333
2.4. Reversing Rows
Example for reversing rows using a negative step:
print 'reverse order =>\n', x[:-6:-1]
# prints
reverse order =>
       yearID teamID lgID   playerID    salary
26427    2016    WSN   NL  zimmery01  14000000
26426    2016    WSN   NL  werthja01  21733615
26425    2016    WSN   NL  treinbl01    524900
26424    2016    WSN   NL  taylomi02    524000
26423    2016    WSN   NL  strasst01  10400000
3. Using Accelerated Selectors
Pandas recommends the use of these selectors for extracting rows in production code, rather than the python array slice syntax shown above.
3.1. ix[label] or ix[pos]
Select row by index label. Or by integer position if label search fails.
See examples below under iloc[pos] and loc[label].
3.2. iloc[pos]
Select row by integer position. Raises IndexError if position not valid (position not between 0 and length – 1).
A Single Position
y = pd.DataFrame({'one': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'two': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'three': pd.Series(list('ABCDE'), index=list('abcde'))})
print 'data =>\n', y, '\n'
print 'single pos =>\n', y.iloc[1], '\n'
# prints
data =>
   one three  two
a   88     A   81
b   39     B   36
c   43     C   78
d   86     D   21
e   11     E   26 
single pos =>
one      39
three     B
two      36
Name: b, dtype: object
Positions Array
Array of discrete positions. Note that position indices may be repeated and in any order.
print 'pos array =>\n', y.iloc[[3, 2, 3]], '\n' # prints pos array => one three two d 86 D 21 c 43 C 78 d 86 D 21
Positions Slice
print 'pos slice =>\n', y.iloc[1:3], '\n' # prints pos slice => one three two b 39 B 36 c 43 C 78
Boolean Array
print 'boolean array =>\n', y.iloc[[True, False, True]], '\n' # prints boolean array => one three two a 88 A 81 c 43 C 78
3.3. loc[label]
Select row by index label. Raises KeyError when the label is not found. Following arguments are supported.
A Single Label:
y = pd.DataFrame({'one': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'two': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'three': pd.Series(list('ABCDE'), index=list('abcde'))})
print 'data =>\n', y, '\n'
print 'single label =>\n', y.loc['a'], '\n'
# prints
data =>
   one three  two
a   46     A   25
b    9     B   80
c   91     C   70
d   95     D   42
e   83     E   16 
single label =>
one      46
three     A
two      25
Name: a, dtype: object
An Array of Labels
print 'label array =>\n', y.loc[['b', 'c']], '\n' # prints label array => one three two b 9 B 80 c 91 C 70
A Label Slice
print 'label slice =>\n', y.loc['b': 'd'], '\n' # prints label slice => one three two b 9 B 80 c 91 C 70 d 95 D 42
A Boolean Array
print 'boolean array =>\n', y.loc[[True, False, True, True]], '\n' # prints boolean array => one three two a 46 A 25 c 91 C 70 d 95 D 42
4. Fast Scalar Access
Pandas recommends that for fast access of scalar values, you can use at() and iat().
4.1. at[rowlabel, colname]
With at(), you need to specify the row label for the first argument and the column name for the second.
y = pd.DataFrame({'one': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'two': pd.Series(random.sample(xrange(100), 5), index=list('abcde')),
                 'three': pd.Series(list('ABCDE'), index=list('abcde'))})
print y, '\n'
print 'at =>', y.at['b', 'one']
# prints
   one three  two
a   97     A   18
b   17     B   15
c   33     C   60
d   11     D   88
e    0     E   92 
at => 17
4.2. iat[rowpos, colpos]
When using iat(), both arguments need to be integer positions of the row and column respectively.
print 'iat =>', y.iat[2, 1] # prints iat => C
5. Take a Few Rows
Both Series and DataFrame support a method take() which accepts a list of indices and returns rows at those indices.
x = pd.read_csv('big-data/Salaries.csv')
print 'head =>\n', x.head(), '\n'
print x.take([0, 3, 4])
# prints
head =>
   yearID teamID lgID   playerID  salary
0    1985    ATL   NL  barkele01  870000
1    1985    ATL   NL  bedrost01  550000
2    1985    ATL   NL  benedbr01  545000
3    1985    ATL   NL   campri01  633333
4    1985    ATL   NL  ceronri01  625000 
   yearID teamID lgID   playerID  salary
0    1985    ATL   NL  barkele01  870000
3    1985    ATL   NL   campri01  633333
4    1985    ATL   NL  ceronri01  625000
You can also take() some columns by specifying the column indices along with the argument axis=1 to indicate a column-wise operation.
print 'take cols =>\n', x.take([0, 1, 3], axis=1).head() # prints yearID teamID playerID 0 1985 ATL barkele01 1 1985 ATL bedrost01 2 1985 ATL benedbr01 3 1985 ATL campri01 4 1985 ATL ceronri01
The take() method works with integer positions and not index labels as shown here when you change the index as follows:
x = x.set_index('yearID')
print 'index changed =>\n', x.head(), '\n'
print 'took rows =>\n', x.take([0, 3, 4])
# prints
index changed =>
       teamID lgID   playerID  salary
yearID                               
1985      ATL   NL  barkele01  870000
1985      ATL   NL  bedrost01  550000
1985      ATL   NL  benedbr01  545000
1985      ATL   NL   campri01  633333
1985      ATL   NL  ceronri01  625000 
took rows =>
       teamID lgID   playerID  salary
yearID                               
1985      ATL   NL  barkele01  870000
1985      ATL   NL   campri01  633333
1985      ATL   NL  ceronri01  625000
Summary
This article presented some ways of selecting data from a DataFrame. We covered the python array slice syntax, and the attribute extractors ix, iloc and loc. For fast single value extraction, we use at and iat attributes.
	
i really appreciate you .
it is very useful for me