In this article we will show you exactly 6 ways that you can select multiple columns in a pandas dataframe.
Method 1: Pandas Double Square Brackets
Just like how we normally select a single column in pandas dataframe where we can use a single bracket “” and declare the column name.
Similarly, to select multiple columns we can use pandas bracket however we need to add an additional bracket making it double square brackets “[]”, where inside the bracket we list each column names separated by a comma.
As shown above, the dataframe name is declared, followed by our double squared bracket combined with each individual column name listed inside.
To give you a better idea we will import a CSV file from Kaggle and grab 3 column names through the use of pandas double square brackets.
You can download the dataset here: https://www.kaggle.com/amoghrrao2/video-games-list
#import our pandas dataframe as pd import pandas as pd #read our CSV file and assign it to dataframe variable dataframe = pd.read_csv("Windows_Games_List.csv") #pick our column names and assign it to our attribute variable attribute = dataframe[["titles", "released", "developers"]] #print our attribute data print(attribute)
In this code we basically imported our CSV file then used the double brackets to retrieve our first 3 columns (“titles”, “released” and “developers”), afterward we assigned our filtered columns and assigned it to our attribute variable.
As you can see in our output we have a new dataset that contains only the specified columns.
Method 2: Pandas Dataframe.Columns
One of the issues with using pandas double square method is the requirement for inputting each column name, this would be an issue for those who do not know the column names or if there are multiple columns required. Therefore it can be quite tedious.
Another method we can use is the pandas “dataframe.columns” function.
This method works through the use of indexing where we can declare the column index number rather than the column names.
To output the same column names from our previous step we can use the following code:
dataframe[dataframe.columns[0:3]] # index the first 3 columns
To index our first 3 columns we can declare the start of 0 and end of 3. The reason why the index starts at zero is due to python’s zero based indexing. Where the column index starts at zero for the first column, one for the second column and so on.
Method 3: Pandas Dataframe.iloc
Just like our previous function, Dataframe.iloc is also an indexed based function. This means to select our multiple columns we have to use our index number, rather than the column names.
The only requirement with Dataframe.iloc is the requirement of declaring the start and end slice for the rows.
To output our first three columns we can use the following code:
To output, our first three columns the use of colon “:” in our row input is to indicate start to end which grabs our entire rows. Next as used previously we used our index notation to slice our first three columns.
Method 4: Pandas Dataframe.reindex()
Pandas Dataframe.reindex() is normally used to swap and reallocate the index of rows and columns of a dataframe. But it can also be used to filter our columns. The Dataframe.reindex() function has an attribute called “columns” which allows input names of columns, by doing so we can effectively filter our dataframe through the use of the “columns attribute”.
dataframe.reindex(columns=["titles", "released", "developers"])
Method 5: Pandas Dataframe.filter()
dataframe.filter(["titles", "released", "developers"])
Method 6: Pandas Dataframe.get()
dataframe.get(["titles", "released", "developers"])
How to select multiple rows and multiple columns using pandas dataframe?
From the use of our previous notations you may have noticed there were a double brackets