共享30个超级好用的Pandas实战技巧

2025-10-18 12:16:24

f.profile_report(title="Pandas Profiling Report")

profile.to_file(output_file="output.html")

基于数组的操纵 pandas都能表示的数组有很多

基于数组来选取统计数据我们渴望选取成来的统计数据之外或者是不之外我们就让要的数组的统计数据，编码如下

# 选取统计数据

df.select_dtypes(include="number")

df.select_dtypes(include=["category", "datetime"])

# 除去统计数据

df.select_dtypes(exclude="object")

断定数组主要线程的是infer_objects()作法，编码如下

df.infer_objects().dtypes

手动进行时数组的类比我们手动地进行时数组的类比，要是碰见不必类比的情况时，errors='coerce'将其换转成NaN，编码如下

# 针对整个统计数据集都有效

df = df.apply(pd.to_numeric, errors="coerce")

# 将空倍数用零来混和

pd.to_numeric(df.numeric_column, errors="coerce").fillna(0)

都只顺利完成数组的类比来作的是astype作法，编码如下

df = df.astype(

{

"date": "datetime64[ns]",

"price": "int",

"is_weekend": "bool",

"status": "category",

}

列的操纵从那时起 rename()作法进行时列的从那时起，编码如下

df = df.rename({"PRICE": "price", "Date (mm/dd/yyyy)": "date", "STATUS": "status"}, axis=1)

附加前缀或者是形容词 add_prefix()作法以及add_suffix()作法，编码如下

df.add_prefix("pre_")

df.add_suffix("_suf")

扩建一个列线程的是assign作法，当然除此之外还有其他的作法可供尝试，编码如下

# 摄氏度与华氏度两者之间的数制类比

df.assign(temp_f=lambda x: x.temp_c * 9 / 5 + 32)

在均须的位置插入新的一列同样也是来作insert作法，编码如下

random_col = np.random.randint(10, size=len(df))

df.insert(3, 'random_col', random_col) # 在第三列的地方插入

if-else逻辑断定

df["price_high_low"] = np.where(df["price"]> 5, "high", "low")

改成某些列线程的是drop()作法，编码如下

df.drop('col1', axis=1, inplace=True)

df = df.drop(['col1','col2'], axis=1)

df.drop(df.columns[0], inplace=True)

数组的操纵九位的操纵要是我们就让要对九位做成一些改变，编码如下

# 对于九位的数组操纵

df.columns = df.columns.str.lower()

df.columns = df.columns.str.replace(' ', '_')

Contains()作法

## 是否之外了某些数组

df['name'].str.contains("John")

## 里面可以置于正则表达式

df['phone_num'].str.contains('...-...-....', regex=True) # regex

findall()作法

## 正则表达式

pattern = '([A-Z0-9._%+-]+)@([A-Z0-9.-]+)\.([A-Z]{1,9})'

df['email'].str.findall(pattern, flags=re.IGNORECASE)

缺失倍数查询空倍数的比例我们要是就让要查询在统计数据集之中空倍数所占去的比例，编码如下

def missing_vals(df):

"""空倍数所占去的百分比"""

missing = [

(df.columns[idx], perc)

for idx, perc in enumerate(df.isna().mean() * 100)

if perc> 0

]

if len(missing) == 0:

return "没有空倍数统计数据的共存"

# 排序

missing.sort(key=lambda x: x[1], reverse=True)

print(f"总计有 {len(missing)} 个变量共存空倍数")

for tup in missing:

print(str.ljust(f"{tup[0]: {round(tup[1], 3)}%", 1))

output

总计有 19 个变量共存空倍数

PoolQC => 99.521%

MiscFeature => 96.301%

Alley => 93.767%

Fence => 80.753%

FireplaceQu => 47.26%

LotFrontage => 17.74%

GarageType => 5.548%

GarageYrBlt => 5.548%

GarageFinish => 5.548%

GarageQual => 5.548%

GarageCond => 5.548%

BsmtExposure => 2.603%

BsmtFinType2 => 2.603%

BsmtQual => 2.534%

BsmtCond => 2.534%

BsmtFinType1 => 2.534%

MasVnrType => 0.548%

MasVnrArea => 0.548%

Electrical => 0.068%

空倍数的处理方式我们可以选项将空倍数去铲除，或者用平均倍数或者其他最大倍数来进行时混和，编码如下

# 去铲除空倍数

df.dropna(axis=0)

df.dropna(axis=1)

# 改用其他倍数来混和

df.fillna(0)

df.fillna(method="ffill")

df.fillna(method='bfill')

# 取代为其他的最大倍数

df.replace( -999, np.nan)

df.replace("?", np.nan)

# 推测其空倍数应该为其他什么最大倍数

ts.interpolate() # time series

df.interpolate() # fill all consecutive values forward

df.interpolate(limit=1) # fill one consecutive value forward

df.interpolate(limit=1, limit_direction="backward")

df.interpolate(limit_direction="both")

月份编解码器的统计数据处理获取均须时间的统计数据

# 从现今开始算是，之后的N天或者N个礼拜或者N个小时

date.today() + datetime.timedelta(hours=30)

date.today() + datetime.timedelta(days=30)

date.today() + datetime.timedelta(weeks=30)

# 过去的一年

date.today() - datetime.timedelta(days=365)

通过月份时间来获取统计数据

df[(df["Date"]> "2015-10-01") & (df["Date"] 通过均须月份来获取统计数据

# 选取成碰巧的统计数据

df[df["Date"].dt.strftime("%Y-%m-%d") == "2022-03-05"]

# 选取成某一个月的统计数据

df[df["Date"].dt.strftime("%m") == "12"]

# 选取成每一年的统计数据

df[df["Date"].dt.strftime("%Y") == "2020"]

将编解码器化统计数据集沿用均须倍数对于一些元组的统计数据，我们渴望可以沿用倍数后的两位或者是三位，编码如下

format_dict = {

"Open": "${:.2f}",

"Close": "${:.2f}",

"Volume": "{:,}",

}

df.style.format(format_dict)

output

图标揭示统计数据对于均须的一些统计数据，我们渴望是图标揭示，编码如下

df.style.format(format_dict)

.hide_index()

.highlight_min(["Open"], color="blue")

.highlight_max(["Open"], color="red")

.background_gradient(subset="Close", cmap="Greens")

.bar('Volume', color='lightblue', align='zero')

.set_caption('Tesla Stock Prices in 2017')

output

。

无锡男科医院哪家看的好
银川男科医院挂号
广州牛皮癣医院哪家看的好
苏州白癜风去哪看
南京看男科去哪家医院最好
长新冠会破坏人们的运动能力？真相往往更残酷！
咳嗽有痰用急支糖浆还是川贝枇杷膏
小便异味
补钙
止咳糖浆饭前喝还是饭后喝

上一篇：华为5G旗舰代表来临，麒麟9000+徕卡四摄绝顶配置，谁能与之争锋

下一篇：吉利手机来也！Mate你不想说点什么吗？