How to split a full name column into a first and last name columns with pandas in python ?

Active January 31, 2022    /    Viewed 498    /    Comments 0    /    Edit


Examples of how to split a full name column into a first and last name columns with pandas in python:

Create a dataframe with pandas

Let's first create a dataframe with pandas:

import pandas as pd
import numpy as np

data = {'Full_Name':['April Reiter','Emory Miller','David Ballin','Alice Trotter','Virginia Alicia Rios']}

df = pd.DataFrame(data=data)

print(df)

gives

       Full_Name
0   April Reiter
1   Emory Miller
2   David Ballin
3  Alice Trotter
4  Virginia Rios

Split full name into 2 columns

To split the column called Full_Name a solution is to use pandas.Series.str.split:

df['Full_Name'].str.split(expand=True)

          0        1
0     April   Reiter
1     Emory   Miller
2     David   Ballin
3     Alice  Trotter
4  Virginia     Rios

Another solution using pandas.Series.str.extract and a regular expression

df['Full_Name'].str.extract(r'(\w+) (\w+)', expand=True)

gives

          0        1
0     April   Reiter
1     Emory   Miller
2     David   Ballin
3     Alice  Trotter
4  Virginia     Rios

To give directly a name the new columns just do:

df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)', expand=True)

gives then

  First_Name Last_Name
0      April    Reiter
1      Emory    Miller
2      David    Ballin
3      Alice   Trotter
4   Virginia      Rios

Another example

Let's see a more complicated example: lat's assume that one of the full names as a middle name here "Virginia Alicia Rios" instead of "Virginia Rios" previously:

import pandas as pd
import numpy as np

data = {'Full_Name':['April Reiter','Emory Miller','David Ballin','Alice Trotter','Virginia Alicia Rios']}

df = pd.DataFrame(data=data)

print(df)

gives

              Full_Name
0          April Reiter
1          Emory Miller
2          David Ballin
3         Alice Trotter
4  Virginia Alicia Rios

then

df['Full_Name'].str.split(expand=True)

will returns three columns:

          0        1     2
0     April   Reiter  None
1     Emory   Miller  None
2     David   Ballin  None
3     Alice  Trotter  None
4  Virginia   Alicia  Rios

To get only two columns a solution is to use extract

df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)', expand=True)

gives

  First_Name Last_Name
0      April    Reiter
1      Emory    Miller
2      David    Ballin
3      Alice   Trotter
4   Virginia    Alicia

However one can see that the last name of Virginia is Alicia here not Rios. To fix that a solution is to add a $:

print( df['Full_Name'].str.extract(r'(?P<First_Name>\w+) (?P<Last_Name>\w+)$', expand=True) )

gives

  First_Name Last_Name
0      April    Reiter
1      Emory    Miller
2      David    Ballin
3      Alice   Trotter
4     Alicia      Rios

References


Card image cap
profile-image
Daidalos

Hi, I am Ben.

I have developed this web site from scratch with Django to share with everyone my notes. If you have any ideas or suggestions to improve the site, let me know ! (you can contact me using the form in the welcome page). Thanks!



Did you find this content useful ?, If so, please consider donating a tip to the author(s). MoonBooks.org is visited by millions of people each year and it will help us to maintain our servers and create new contents.

Amount