计算多索引 pandas 数据帧外部索引每行的总和-编程学习网

问题内容

我有一个数据框：seller、item、price、shipping、免费送货最低、count available和count required。我的目标是根据稍后计算的 total 找到 seller 和 item 的最便宜的组合（计算代码如下所示）。示例数据如下：

import pandas as pd

item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['seller 1', 'seller 2', 'seller 3', 'seller 4', 'seller 1']
price1 = [1.85, 1.94, 2.00, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]

df1 = pd.dataframe({'seller':seller1,
                    'item':item1,
                    'price':price1,
                    'shipping':shipping1,
                    'free shipping minimum':freeship1,
                    'count available':countavailable1,
                    'count needed':countneeded1})

# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
    if row['count available'] >= row['count needed']:
        df1.at[index, 'fulfills count needed'] = 'yes'
    else:
        df1.at[index, 'fulfills count needed'] = 'no'

# dont want to calc price based on [count available], so need to check if seller has count i need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
    if row['count available'] >= row['count needed']:
        df1.at[index, 'price x count'] = row['count needed'] * row['price']
    else:
        df1.at[index, 'price x count'] = row['count available'] * row['price']

但是，任何一个seller都可以出售多个item。我想尽量减少支付的运费，所以我想通过 seller 将 items 分组在一起。因此，我根据我在另一个线程中看到的方式使用 .first() 方法对它们进行分组，以便将每一列保留在新的分组数据框中。

# don't calc [total] until sellers have been grouped
# use first() method to return all columns and perform no other aggregations
grouped1 = df1.sort_values('price').groupby(['seller', 'item']).first()

此时我想通过seller计算total。所以我有以下代码，但它为每个 item 计算 total，而不是 seller，这意味着 shipping 根据每个组中的商品数量被多次添加，或者当 price x count 结束时不应用免费送货最低免运费。

# calc [Total]
for index, row in grouped1.iterrows():
    if (row['Free Shipping Minimum'] == 50) & (row['Price x Count'] > 50):
        grouped1.at[index, 'Total'] = row['Price x Count'] + 0
    elif (row['Free Shipping Minimum'] == 5) & (row['Price x Count'] > 5):
        grouped1.at[index, 'Total'] = row['Price x Count'] + 0
    else:
        grouped1.at[index, 'Total'] = row['Price x Count'] + row['Shipping']

实际上看起来我可能需要在计算 total 时对每个 seller 求和 price x count ，但这本质上是同一个问题，因为我不知道如何计算外部索引的每行列。我可以使用什么方法来做到这一点？

另外，如果有人对如何实现我的后半部分目标有任何建议，请尽管提出。我只想退回我需要的每件商品。例如，我需要 2 个“项目 1”和 2 个“项目 2”。如果“卖家 1”有 2 个“商品 1”和 1 个“商品 2”，而“卖家 2”有 1 个“商品 1”和 1 个“商品 2”，那么我想要“卖家 1”的所有商品（假设它最便宜），但只有“卖家 2”的 1 个“商品 1”。这似乎会影响 total 列的计算，但我不确定如何实现它。

正确答案

我最终决定首先对 seller 进行分组，并对 price x count 进行求和以找到 subtotals，将其转换为数据帧，然后将 df1 与新的 subtotal 数据帧合并以创建 groupedphpcnend cphpcn 数据框。然后我使用 np.where 建议创建了 totals 列（这比我的 for 循环优雅得多，并且可以轻松处理 nan 值）。最后按seller、total、item分组返回我想要的结果。最终代码如下：


import pandas as pd
import numpy as np

item1 = ['item 1', 'item 2', 'item 1', 'item 1', 'item 2']
seller1 = ['Seller 1', 'Seller 2', 'Seller 3', 'Seller 4', 'Seller 1']
price1 = [1.85, 1.94, 2.69, 2.00, 2.02]
shipping1 = [0.99, 0.99, 0.99, 2.99, 0.99]
freeship1 = [5, 5, 5, 50, 5]
countavailable1 = [1, 2, 2, 5, 2]
countneeded1 = [2, 1, 2, 2, 1]

df1 = pd.DataFrame({'Seller':seller1,
                    'Item':item1,
                    'Price':price1,
                    'Shipping':shipping1,
                    'Free Shipping Minimum':freeship1,
                    'Count Available':countavailable1,
                    'Count Needed':countneeded1})

# create columns that states if seller has all counts needed.
# this will be used to sort by to prioritize the smallest number of orders possible
for index, row in df1.iterrows():
    if row['Count Available'] >= row['Count Needed']:
        df1.at[index, 'Fulfills Count Needed'] = 'Yes'
    else:
        df1.at[index, 'Fulfills Count Needed'] = 'No'

# dont want to calc price based on [count available], so need to check if seller has count I need and calc cost based on [count needed].
# if doesn't have [count needed], then calc cost on [count available].
for index, row in df1.iterrows():
    if row['Count Available'] >= row['Count Needed']:
        df1.at[index, 'Price x Count'] = row['Count Needed'] * row['Price']
    else:
        df1.at[index, 'Price x Count'] = row['Count Available'] * row['Price']

# subtotals by seller, then assign calcs to column called [Subtotal] and merge into dataframe
subtotals = df1.groupby(['Seller'])['Price x Count'].sum().reset_index()

subtotals.rename({'Price x Count':'Subtotal'}, axis=1, inplace=True)

grouped = df1.merge(subtotals[['Subtotal', 'Seller']], on='Seller')


# calc [Total]
grouped['Total'] = np.where(grouped['Subtotal'] > grouped['Free Shipping Minimum'],
                             grouped['Subtotal'], grouped['Subtotal'] + grouped['Shipping'])

grouped.groupby(['Seller', 'Total', 'Item']).first()
以上就是计算多索引 pandas 数据帧外部索引每行的总和的详细内容，更多请关注编程网其它相关文章！

阅读原文内容投诉

免责声明：

① 本站未注明“稿件来源”的信息均来自网络整理。其文字、图片和音视频稿件的所属权归原作者所有。本站收集整理出于非商业性的教育和科研之目的，并不意味着本站赞同其观点或证实其内容的真实性。仅作为临时的测试数据，供内部测试之用。本站并未授权任何人以任何方式主动获取本站任何信息。

② 本站未注明“稿件来源”的临时测试数据将在测试完成后最终做删除处理。有问题或投稿请发送至: 邮箱/279061341@qq.com QQ/279061341

`软考中级精品资料免费领`

历年真题答案解析
备考技巧名师总结
高频考点精准押题

资料下载
历年真题

2024上半年软考中级软件测评师考试基础知识真题
193.9 KB下载数265
2024上半年软考中级软件设计师考试基础知识真题
191.63 KB下载数245
2023下半年-系统集成项目管理工程师-真题考点汇总（完整版）
143.91 KB下载数1148
2023年下半年系统集成项目管理工程师第一、二、三批次真题考点整理(考友回忆版)
183.71 KB下载数642
2023年上半年软考中级《系统集成项目管理工程师》-基础知识-考试真题及答案
644.84 KB下载数2756

2024年上半年信息系统项目管理师第二批次真题及答案解析（完整版）
难度 813人已做
查看
【考后总结】2024年5月26日信息系统项目管理师第2批次考情分析
难度 354人已做
查看
【考后总结】2024年5月25日信息系统项目管理师第1批次考情分析
难度 318人已做
查看
2024年上半年软考高项第一、二批次真题考点汇总（完整版）
难度 435人已做
查看
2024年上半年系统架构设计师考试综合知识真题
难度 224人已做
查看

`相关文章`

发现更多好内容

`猜你喜欢`

AI推送时光机

计算多索引 pandas 数据帧外部索引每行的总和
后端开发2024-02-05

如何利用pandas工具输出每行的索引值、及其对应的行数据后端开发2024-04-02

PHP如何带索引检查计算数组的差集，用回调函数比较数据和索引
后端开发2024-04-02

PHP如何带索引检查计算数组的差集，用回调函数比较数据和索引
后端开发2024-04-02

Java如何带索引检查计算数组的差集，用回调函数比较数据和索引
后端开发2024-04-02

PHP如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引
后端开发2024-04-02

Java如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引
后端开发2024-04-02

PHP如何带索引检查计算数组的交集，用单独的回调函数比较数据和索引
后端开发2024-04-02

numpy与Python的异步编程：如何优化数据索引和计算？后端开发2023-09-02

如何通过索引优化PHP与MySQL的计算字段和JSON数据的查询？后端开发2023-10-21

如何通过索引提升PHP与MySQL的行数估算和数据去重查询的效率？后端开发2023-10-21

位置：首页-资讯-后端开发

咦！没有更多了？去看看其它编程学习网内容吧

文章详情

计算多索引 pandas 数据帧外部索引每行的总和

正确答案

软考中级精品资料免费领

相关文章

猜你喜欢