pandas的一些用法

pandas的一些常用方法和遇到的小问题

import pandas as pd
pathfile = 'xxx.json'
data = pd.read_json(pathfile)

data的类型为 <class 'pandas.core.frame.dataframe'></class>
Python机器学习(八十三)Pandas 读取 JSON 数据

2.pandas日期转换

3.pandas写入csv格式文件出现中文乱码问题解决方法

代码:
import pandas as pd
a = [['a','b','c','d'], ['e','f','g','h']]
a_df = pd.DataFrame(a)
print(a_df)
print(type(a_df))

结果:
   0  1  2  3
0  a  b  c  d
1  e  f  g  h
<class 'pandas.core.frame.DataFrame'>
代码:
#对于简单列表而言,转为DataFrame后,需要转置,才是一行
b = ['a','b','c','d']
b_df = pd.DataFrame(b)
print(b_df)
print(type(b_df))

b_df_T = b_df.T
print(b_df_T)
print(type(b_df_T))

结果:
   0
0  a
1  b
2  c
3  d

   0  1  2  3
0  a  b  c  d

  • 实际遇到的例子:
    某些情况我们需要将 一行列表存入csv文件中,当 all_content = pd.DataFrame(all_content)后,数据被转成一列,这时 all_content.to_csv()会出错。所以加一个 flag标志的判断,当是一列是进行转置。
    if len(all_content) == 0:
        all_content = row
        flag = True
    else:
        all_content = np.row_stack((all_content, row))
        flag = False

all_content = pd.DataFrame(all_content)
if flag:
    all_content = all_content.T
all_content.to_csv(out_file, index=False,header=header,encoding='utf-8-sig')

5.pd.read_json()

近期处理一些数据(数据已经脱敏),格式如下:


[
 {
  "reposts_count": 0,
  "favorited": 0,
  "update_time": "Sun Jan 06 23:07:51 +0800 2000",
  "original_pic": "",
  "text": " 哈哈@123123123",
  "created_at": "Mon Oct 29 11:30:05 +0800 2000",
  "mid": 123123123123123123,
  "annotations": "",
  "source": "",
  "user": {
    "id": 123123123,
    "idstr": "123123123",
    "screen_name": "xxxxxx",
    "name": "xxxxxxxx",
    "location": "China",
    "gender": "m",
    "statuses_count": 133,
    "favourites_count": 0
  },
  "in_reply_to_screen_name": "",
  "in_reply_to_user_id": 0,
  "comments_count": 2
 },
 {
  "reposts_count": 0,
  "favorited": 0,
  "update_time": "Sun Jan 06 23:07:51 +0800 2010",
  "original_pic": "",
  "text": " 哈哈哈!你好!!",
  "created_at": "Mon Oct 29 11:30:05 +0800 2010",
  "mid": 456456456465456456,
  "annotations": "",
  "source": "",
  "user": {
    "id": 456456456,
    "idstr": "456456456",
    "screen_name": "yyyyyyyy",
    "name": "yyyyyyyy",
    "location": "China",
    "gender": "f",
    "statuses_count": 133,
    "favourites_count": 0
  },
  "in_reply_to_screen_name": "",
  "in_reply_to_user_id": 0,
  "comments_count": 2
 },
]

需要提取上面文件123456.json(或者123456.txt)中的相关内容(例如需要提取 &#x201C;text&#x201D;&#x7684;&#x5185;&#x5BB9;, "user"&#x4E2D;&#x7684;&#x201C;id&#x201D;&#x5185;&#x5BB9;),提取方法如下:

import pandas as pd

datafile = pd.read_json("123456.json",encoding='utf-8')

print("type(datafile): ",type(datafile))
print("datafile:\n",datafile)

num_shape = datafile.shape[0]
print("\n该文件中有 ",num_shape, " 条数据!")

data_text = datafile['text']
print("\ntype(data_text): ",type(data_text))
print("data_text:\n",data_text)

data_user = datafile['user']
print("\ntype(data_user): ",type(data_user))
print("data_user:\n",data_user)

for i in range(num_shape):
    print("\n第",i,"条数据中:")
    text = data_text[i]
    print("type(text):",type(text))
    print("text:", text)
    uid = str(data_user[i]['id'])
    print("type(uid):", type(uid))
    print("uid:", uid)

结果:
type(datafile):  <class 'pandas.core.frame.DataFrame'>
datafile:
    reposts_count  favorited  ... in_reply_to_user_id comments_count
0              0          0  ...                   0              2
1              0          0  ...                   0              2
[2 rows x 13 columns]

该文件中有  2  条数据!

type(data_text):  <class 'pandas.core.series.Series'>
data_text:
0    嘻嘻@123123123
1    哈哈哈!你好!!
Name: text, dtype: object

type(data_user):  <class 'pandas.core.series.Series'>
data_user:
0    {'id': 123123123, 'idstr': '123123123', 'scree...}
1    {'id': 456456456, 'idstr': '456456456', 'scree...}
Name: user, dtype: object

第 0 条数据中:
type(text): <class 'str'>
text: 嘻嘻@123123123
type(uid): <class 'str'>
uid: 123123123

第 1 条数据中:
type(text): <class 'str'>
text: 哈哈哈!你好!!
type(uid): <class 'str'>
uid: 456456456

Original: https://blog.csdn.net/weixin_45644062/article/details/123049979
Author: 佐罗的哈士奇
Title: pandas的一些用法

原创文章受到原创版权保护。转载请注明出处:https://www.johngo689.com/740925/

转载文章受原作者版权保护。转载请注明原作者出处!

(0)

大家都在看

亲爱的 Coder【最近整理,可免费获取】👉 最新必读书单  | 👏 面试题下载  | 🌎 免费的AI知识星球