- 文章信息
- 作者: kaiwu
- 点击数:18
https://public.tableau.com/app/discover
中国航天1970-2019
https://public.tableau.com/app/profile/yuri.wg/viz/50_15655374759590/1
中国电影票房史
https://public.tableau.com/app/profile/.63722048/viz/Antsaregross_16413070224170/sheet0
http://kaiwu.city/openfiles/tableau_tourist_satisfaction_cnlabels.xlsx
http://kaiwu.city/openfiles/tourist2024spring.twb
http://kaiwu.city/openfiles/2024Avg. Salaries for Business Intelligence Analysts.twbx
中国GDP
https://public.tableau.com/app/profile/jeremy7461/viz/GDP_15927982087320/1
http://kaiwu.city/openfiles/China_GDP.twbx
https://public.tableau.com/app/profile/adrian.zinovei/viz/MarketingDashboardReDone/HOME
http://kaiwu.city/openfiles/Marketing Dashboard Re-Design.twbx
- 文章信息
- 作者: kaiwu
- 点击数:126
1.word操作
http://kaiwu.city/openfiles/academic_report_SPSS_CN.docx
http://kaiwu.city/openfiles/example_thesis_CN2023.doc
2.ppt操作
http://kaiwu.city/openfiles/practice_ppt.pptx
3.速查单
http://kaiwu.city/index.php/cheatsheet
https://www.rstudio.com/resources/cheatsheets/#contributed-cheatsheets
https://github.com/rstudio/cheatsheets/raw/master/powerpoints/0-template.pptx
4.词云图
http://kaiwu.city/index.php/tag-cloud
https://www.jasondavies.com/wordcloud/
https://www.yelp.com/biz/home-coffee-roasters-san-francisco-7?osq=CAFE
Affogatos are probably one of my fave treats. During my recent visit, decided to try something different. There's the affogato and then there's Home's Cookie Monster-gato which sounded very interesting. Essentially it's the affogato with cookie butter and little cookie cereal. Barista stated that the cookie butter makes the espresso creamier. Decided to try it. OMG!!! Sooo good! Definitely creamier and such a decadent treat. Daughter got one too. She offered some to her dad who initially declined but then took one taste and finished it off. Needed to order another one. As I write this, I'm already craving another one.
I would say this place is averagely good. Obviously the drinks are decorated very well but I wouldn't say the flavors are anything special. I was excited to try this place based on the reviews and photos but I have to say I was sort of let down. Nevertheless, the drinks didn't taste bad and I got the fix of coffee that I needed. We ordered the Birthday Cake latte (hot) and Cookie Monster latte (iced). Both were ok, the Cookie Monster had a little bit more flavor because of the cookie butter at the bottom, but still I feel this place is mainly focused on making the drinks looks pretty, and they are definitely succeeding in that respect. Overall, I don't think I will return but if you are looking for pretty lattes to post on social media this is definitely the place for you!
Updated review for Oct 2021. No indoor seating or public bathrooms available. To-go ordering only. Coffee quality is still great, but sad indoor dining still wasn't available despite many other areas re-opening up partially/completely. If one is looking for indoor cafe seating, walk a few blocks down to Henry's coffee house where indoor/outdoor patio is available.
Excellent iced salted caramel latte and Bravo egg avocado toast!! Excellent workers and food. Happy I tried this coffee spot today!! |
https://spectrum.ieee.org/top-programming-languages-2022
100 |
python |
97 |
C |
89 |
C++ |
87 |
C# |
70 |
Java |
47 |
SQL |
40 |
JavaScript |
19 |
R |
18 |
HTML |
17 |
TypeScript |
13 |
Go |
13 |
PHP |
10 |
Shell |
9 |
Ruby |
9 |
Scala |
8 |
Matlab |
7 |
SAS |
https://zhuanlan.zhihu.com/p/90665345
无意中发现的店,一直躺在收藏夹里,工作日的下午跟朋友特意过去打卡,下雨天,人还是不少,室内室外都能坐,先找位置,然后桌上二维码扫码点单的!我们先逛了一圈,很多小姐姐都在拍照,还有个草坪,天气好的时候拍照会很出片!南瓜汤:味道浓郁,略浓稠的汤,上面撒的杏仁粒很香很脆!下次准备再去,吃点别的!
风和日丽的日子,总是热闹的,野餐、遛狗,喝咖啡~~ 一尺花园,古色古香的中式老房子与西式美食、咖啡结合别有特色,周末人太多了有点遗憾。在这里不迷恋他家的咖啡,居然是别人家的狗狗,萌宠治愈心情,阳光下草地上都好可爱哦。
夏天的时候一家子到川沙玩,打卡了川沙网红一尺花园。一进大门就小池鲤鱼,再往里走,大厅有好几个艺术雕像。 我们是下午到的,先点了茶饮料咖啡·蛋糕乡,都挺好吃,但是大厅太热,空调不足,我一直汗哒哒滴。 外面有很大草坪花园,很多人带着孩子或者毛孩子在外面奔跑,我害怕太阳,没有出去逛。 我们坐到晚上,服务员给我换了一间房,空调足了,点了个套餐,解决了晚餐。 总体环境意境不错,适合拍照打卡,晚餐不如下午茶,东西太咸。
马上要降温了,又不能出沪,赶着最后的暖冬带孩子出来玩 这家一尺花园陶家宅亻号店就在我们住的酒店隔壁,因为本身就是一个百年宅邸,也是酒店的宣传卖点之一,预定酒店赠送 了两张咖啡券,于是就想着不行到这边喝杯咖啡。 环境确实不错,门口有个草坪草坪里还有个沙坑很受小朋友的欢迎,至于滑滑梯等玩乐设施早就封底熬了,上面用绳子拦 着写着设施故障啥的,反正就是给人年久失修没人维护的破落感。一开始进去点咖啡,服务员看我拿的是酒店赠送的券,很有点不耐烦的感觉,咖啡甜品单价都挺高,这个新品杏仁慕斯我觉得甜了点,后来家里人来了想想也懒得开车出去,就干脆在这里点了中午正餐,味道还不错,牛肉不老当然价格也是很高,面包酥脆挺入味,不过总的来说性价比一般。户外座位确实挺紧俏,几乎坐满,想想如果不是住店,专门跑那么远来喝杯咖啡,我也是挺佩服的,换我是不太会。
|
http://kaiwu.city/index.php/sscijournal
https://www.tandfonline.com/doi/full/10.1080/14616688.2020.1750683
- 文章信息
- 作者: kaiwu
- 点击数:63
https://www.ibm.com/cn-zh/topics/generative-ai
生成式 AI(generative artificial inteligence),有时也称作 gen AI,是一种人工智能 (AI),能够创建原创内容(例如文本、图像、视频、音频或软件代码)以响应用户的提示或请求。
1.国际网站
1.1 transform 模型
经典论文
Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A. N., Kaiser, L., & Polosukhin, I. (2017). Attention Is All You Need (arXiv:1706.03762). arXiv. https://doi.org/10.48550/arXiv.1706.03762
视频解读
https://www.bilibili.com/video/BV1gG411a7bw/
科技参考解读
https://www.dedao.cn/course/article?id=3bezDG7wBonmJwgeE9JvQkAg5PyO1x
1.2 chatGPT from openAI
1.3gemini from google
1.4 claude from anthropic
https://www.anthropic.com/claude
1.5 llama from meta
https://llama.meta.com/llama3/
1.6 秘塔metaso
1.7 月之暗面kimi
1.8 音乐创作suno
2.具体练习
2.1 检索问题
检索短语:旅游体验的本真性问题
2.2 检索代码
检索短语:如何用python导入json格式数据
2.3 总结文稿
Wang, N. (1999). Rethinking Authenticity in Tourism Experience. Annals of Tourism Research, 26(2), 349–370. https://doi.org/10.1016/S0160-7383(98)00103-0
http://kaiwu.city/openfiles/rethinking_authenticity1999wang_ning.pdf
短语:请用200字总结一下这个文件
2.4 OCR识别
短语:请识别为文字,并用表格整理一下
2.5 翻译
短语:推迟2天,周六下午两点来我办公室讨论一下,请翻译为英文
短语:请提供8个版本
2.6 诗歌
短语:请依据《水调歌头》词牌,做出一首词,关于大学生春天郊游
短语:请注意押韵,使用安、然,an这种韵脚
2.7 数据分析
http://kaiwu.city/openfiles/tourist.csv
短语:请总结一下这个数据集
短语:请提供对性别gender变量进行频数分析的python代码
- 文章信息
- 作者: kaiwu
- 点击数:569
http://kaiwu.city/openfiles/solver2018.xlsx
solver2018.xlsx
http://kaiwu.city/openfiles/excel_reference.xlsx
excel_reference.xlsx
http://kaiwu.city/openfiles/data_tourist_satisfaction_cnlabels.xlsx
1.关于示例数据集 (tourist satisfaction)
1.1问卷
http://kaiwu.city/openfiles/tourist_satisfaction_questionnaire_cn.pdf
1.2 通用数据格式 (没有 variable labels变量名标签和 value labels变量值标签)
http://kaiwu.city/openfiles/tourist.csv
或
http://kaiwu.city/openfiles/data_tourist_satisfaction_cn.xlsx
1.3 分类变量包含标签的Excel文件 (有 variable labels变量名标签和 value labels变量值标签)
http://kaiwu.city/openfiles/data_tourist_with_cnlabels.xlsx
1.4在线测验使用数据文件
http://kaiwu.city/openfiles/quiz_data_tourist_with_cnlabels.xlsx
1.5 VBA
http://kaiwu.city/openfiles/Excel_VBA_example.xlsm
2.数据准备data preparation (data cleaning or data wrangling)
2.1 导入数据
https://libguides.library.kent.edu/SPSS/ImportData
http://kaiwu.city/openfiles/data_tourist_cn.sav
2.2 数据管理
3.基本分析
分析结果汇报,参考如下文件格式
http://kaiwu.city/openfiles/academic_report_SPSS_CN.docx
3.1 频数分析frequency Analysis(分类变量)
3.2 列联表分析cross-table(分类变量)
3.3 描述性统计分析Descriptive analysis(数值变量)
4.1 卡方检验chi-square test
4.2 独立样本T检验independent sample t-test
4.3 单因素方差分析ANOVA
4.4 相关分析correlation
- 文章信息
- 作者: kaiwu
- 点击数:97
问卷
http://kaiwu.city/openfiles/tourist_satisfaction_questionnaire_cn.pdf
http://kaiwu.city/openfiles/tourist.csv
http://kaiwu.city/openfiles/tourist_CN_python1.ipynb
1.python基础
python介绍
http://kaiwu.city/index.php/python
python推荐书籍
http://kaiwu.city/index.php/python-book
2.python软件的安装
http://kaiwu.city/index.php/python-vscode
3.使用python控制Excel
http://kaiwu.city/openfiles/hotel50python.xlsx
http://kaiwu.city/openfiles/python_excel50hotel.ipynb
4.豆瓣电影的简单分析
https://od.lk/d/165592124_vkZkQ/webscraping_douban_top150.ipynb
![](/images/excel_xlsx.png)
https://od.lk/d/165592122_b5z4S/analysis_douban_top150.ipynb
http://kaiwu.city/openfiles/analysis_douban_top150.ipynb
1.准备工作¶
1.1 import libraries¶
import pandas as pd
import numpy as np
from matplotlib import pyplot as plt
import matplotlib
# 预设字体格式,并传给rc方法
font = {'family': 'SimSun', "size": 16}
# 设置 字体
matplotlib.rc('font', **font)
# 设定文件存储目录
datafolder='D:/tdata/'
# 导入数据
# 从网盘链接(直链)导入数据
#df = pd.read_csv('https://od.lk/d/179075968_VxeGV/movie150clean.csv', index_col=0)
# 从本地磁盘导入数据
df = pd.read_csv(datafolder+'movie150clean.csv', index_col=0)
df1.tail(5)
movie_weblink | photo_weblink | cn_name | fr_name | rating | numbers | movie_sentence | directors | actors | ryear | country | theme | |
---|---|---|---|---|---|---|---|---|---|---|---|---|
n | ||||||||||||
146 | https://movie.douban.com/subject/1307315/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 哪吒闹海 | 9.1 | 208858 | 想你时你在闹海。 | 严定宪 Dingxian Yan / 王树忱 Shuchen Wang | 梁正晖 Zhenghui | 1979 | 中国大陆 | 冒险 动画 奇幻 | |
147 | https://movie.douban.com/subject/26628357/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 一个叫欧维的男人决定去死 | En man som heter Ove | 8.9 | 346098 | 惠及一生的美丽。 | 汉内斯·赫尔姆 Hannes Holm | 罗夫·拉斯加德 Rolf Lassgård | 2015 | 瑞典 | 剧情 |
148 | https://movie.douban.com/subject/6307447/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 被解救的姜戈 | Django Unchained | 8.8 | 499185 | 热血沸腾,那个低俗、性感的无耻混蛋又来了。 | 昆汀·塔伦蒂诺 Quentin Tarantino | 杰米·福克斯 Jamie Foxx | 2012 | 美国 | 剧情 动作 西部 冒险 |
149 | https://movie.douban.com/subject/1295399/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 七武士 | 七人の侍 | 9.3 | 158687 | 时代悲歌。 | 黑泽明 Akira Kurosawa | 三船敏郎 Toshirô Mifune | 1954 | 日本 | 动作 冒险 剧情 |
150 | https://movie.douban.com/subject/1395091/ | https://img9.doubanio.com/view/photo/s_ratio_p... | 未麻的部屋 | Perfect Blue | 9.0 | 242480 | 好的剧本是,就算你猜到了结局也猜不到全部。 | 今敏 Satoshi Kon | 岩男润子 Junko Iwao / 松本梨香 Rica Matsu | 1997 | 日本 | 动画 奇幻 惊悚 |
2.频数分析¶
2.1频数分析:电影上映年份¶
#df数据框的ryear变量,数据类型改为【整数】(int)
df['ryear'] = df['ryear'].astype(int)
df['ryear'].dtype
dtype('int32')
df['ryear'] 电影上映年份的频数分析——条形图
df["ryear"].value_counts().sort_index().plot(kind="bar")
<Axes: >
df['ryear'] 电影上映年份的频数分析——条形图 修改图的大小(15,7)——宽15英寸,高7英寸 柱形图的颜色设定为【green】——绿色 颜色的列表参考https://html-color.codes/
df["ryear"].value_counts().sort_index().plot(kind="bar",figsize=(15,7),color="green")
<Axes: >
df['ryear'] 电影上映年份的频数分析——条形图 给柱形图添加频数数值
plot=df["ryear"].value_counts().sort_index().plot(kind="bar",figsize=(15,7),color="orange")
for p in plot.patches:
plot.annotate(str(p.get_height()), (p.get_x() * 1.005, p.get_height() * 1.005))
df_year=df["ryear"].value_counts().to_frame()
df_year.reset_index(inplace=True)
df_year = df_year.rename(columns = {'index':'year','ryear':'freq'})
df_year['year'] = df_year['year'].astype(int)
print(df_year.columns)
df_year.sort_values(by='year')
Index(['year', 'freq'], dtype='object')
year | freq | |
---|---|---|
33 | 1936 | 1 |
28 | 1939 | 1 |
41 | 1953 | 1 |
44 | 1954 | 1 |
24 | 1957 | 2 |
40 | 1961 | 1 |
36 | 1965 | 1 |
35 | 1972 | 1 |
39 | 1974 | 1 |
37 | 1975 | 1 |
30 | 1979 | 1 |
31 | 1983 | 1 |
29 | 1984 | 1 |
43 | 1986 | 1 |
23 | 1987 | 2 |
27 | 1988 | 2 |
38 | 1989 | 1 |
26 | 1990 | 2 |
34 | 1991 | 1 |
42 | 1992 | 1 |
13 | 1993 | 5 |
0 | 1994 | 9 |
3 | 1995 | 7 |
32 | 1996 | 1 |
4 | 1997 | 7 |
17 | 1998 | 4 |
9 | 1999 | 5 |
20 | 2000 | 3 |
5 | 2001 | 6 |
11 | 2002 | 5 |
16 | 2003 | 5 |
2 | 2004 | 8 |
25 | 2005 | 2 |
10 | 2006 | 5 |
14 | 2008 | 5 |
6 | 2009 | 6 |
1 | 2010 | 9 |
7 | 2011 | 6 |
18 | 2012 | 4 |
8 | 2013 | 5 |
15 | 2014 | 5 |
19 | 2015 | 3 |
12 | 2016 | 5 |
22 | 2017 | 3 |
21 | 2018 | 3 |
df['ryear'] 电影上映年份的频数分析——条形图 修改图的大小(15,7)——宽15英寸,高7英寸 柱形图的颜色设定为【orange】——橙色 颜色的列表参考https://html-color.codes/
保存图片到本地
xs =df_year["year"]
ys =df_year["freq"]
plt.figure(figsize=(15,7))
plt.bar(xs, ys, color='orange')
# html color codes https://html-color.codes/
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_yearly.jpg",dpi=600) #保存图片到本地
plt.show()
2.2频数分析:电影产地(国家或地区)¶
# 字符变量,拆分为具体国家地区,一个多选题,
df_country=df['country'].str.split(' ',expand=True)
df_country.head(5)
0 | 1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|---|
n | ||||||
1 | 美国 | None | None | None | None | None |
2 | 中国大陆 | 中国香港 | None | None | None | None |
3 | 美国 | None | None | None | None | None |
4 | 法国 | 美国 | None | None | None | None |
5 | 美国 | 墨西哥 | 澳大利亚 | 加拿大 | None | None |
对多选题做频数统计
df_country1=df_country.stack().value_counts().to_frame()
df_country1.reset_index(inplace=True)
df_country1 = df_country1.rename(columns = {'index':'country',0:'freq'})
print(df_country1.columns)
df_country1.sort_values(by='freq',ascending=False)
Index(['country', 'freq'], dtype='object')
country | freq | |
---|---|---|
0 | 美国 | 84 |
1 | 日本 | 22 |
2 | 英国 | 19 |
3 | 中国香港 | 15 |
4 | 中国大陆 | 14 |
5 | 法国 | 12 |
6 | 德国 | 9 |
7 | 韩国 | 9 |
8 | 意大利 | 8 |
9 | 加拿大 | 6 |
12 | 新西兰 | 3 |
13 | 中国台湾 | 3 |
11 | 澳大利亚 | 3 |
10 | 瑞士 | 3 |
14 | 印度 | 2 |
15 | 瑞典 | 2 |
16 | 伊朗 | 1 |
17 | 荷兰 | 1 |
18 | 巴西 | 1 |
19 | 丹麦 | 1 |
20 | 卡塔尔 | 1 |
21 | 西班牙 | 1 |
22 | 波兰 | 1 |
23 | 塞浦路斯 | 1 |
24 | 黎巴嫩 | 1 |
25 | 墨西哥 | 1 |
26 | 奥地利 | 1 |
# 频数表输出为一个csv文件
df_country1.to_csv(datafolder+'movie_country.csv')
xs =df_country1["country"]
ys = df_country1["freq"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
plt.show()
xs =df_country1["country"]
ys = df_country1["freq"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_region1.png",dpi=600,format="png")
plt.show()
plt.figure(figsize=(12,10))
plt.barh(xs,ys,color='#b85cff')
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(y,x), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(10,-5), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_region2.jpg",dpi=600) #保存图片到本地
plt.show()
2.3 频数分析:电影主题¶
df_theme=df['theme'].str.split(' ',expand=True)
df_theme.head(5)
0 | 1 | 2 | 3 | 4 | |
---|---|---|---|---|---|
n | |||||
1 | 犯罪 | 剧情 | None | None | None |
2 | 剧情 | 爱情 | 同性 | None | None |
3 | 剧情 | 爱情 | None | None | None |
4 | 剧情 | 动作 | 犯罪 | None | None |
5 | 剧情 | 爱情 | 灾难 | None | None |
df_theme1=df_theme.stack().value_counts().to_frame()
df_theme1['theme']=df_theme1.index
df_theme1 = df_theme1.rename(columns = {1:'theme',0:'freq'})
print(df_theme1.columns)
df_theme1
Index(['freq', 'theme'], dtype='object')
freq | theme | |
---|---|---|
剧情 | 114 | 剧情 |
喜剧 | 34 | 喜剧 |
爱情 | 34 | 爱情 |
奇幻 | 33 | 奇幻 |
冒险 | 32 | 冒险 |
动画 | 24 | 动画 |
犯罪 | 21 | 犯罪 |
动作 | 18 | 动作 |
惊悚 | 17 | 惊悚 |
悬疑 | 16 | 悬疑 |
科幻 | 11 | 科幻 |
传记 | 10 | 传记 |
家庭 | 10 | 家庭 |
战争 | 9 | 战争 |
历史 | 6 | 历史 |
音乐 | 5 | 音乐 |
古装 | 5 | 古装 |
歌舞 | 4 | 歌舞 |
同性 | 4 | 同性 |
灾难 | 2 | 灾难 |
西部 | 2 | 西部 |
儿童 | 2 | 儿童 |
纪录片 | 2 | 纪录片 |
武侠 | 2 | 武侠 |
运动 | 1 | 运动 |
情色 | 1 | 情色 |
type(df_theme1)
pandas.core.frame.DataFrame
# 保存频数表
df_theme1.to_csv('D:/tdata/df_theme1.csv')
xs =df_theme1["theme"]
ys = df_theme1["freq"]
plt.figure(figsize=(18,7))
plt.bar(xs, ys, color='#ffb01f')
# html color codes https://html-color.codes/
#plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,3), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_theme1.jpg",dpi=600,format="jpg") #保存图片到本地
plt.show()
plt.figure(figsize=(12,10))
plt.barh(xs, ys,color='#b85cff')
for x,y in zip(xs,ys):
label = "{:.0f}".format(y)
plt.annotate(label, # this is the text
(y,x), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(15,-5), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_theme2.jpg",dpi=600) #保存图片到本地
plt.show()
3.电影评分的均值¶
3.1 电影评分均值:上映年份¶
agg_year=df.groupby("ryear")[['rating']].agg('mean')
agg_year['year']=agg_year.index
print(agg_year.columns)
agg_year.head(5)
Index(['rating', 'year'], dtype='object')
rating | year | |
---|---|---|
ryear | ||
1936 | 9.3 | 1936 |
1939 | 9.3 | 1939 |
1953 | 9.0 | 1953 |
1954 | 9.3 | 1954 |
1957 | 9.5 | 1957 |
plt.figure(figsize=(15,7))
xs=agg_year['year']
ys=agg_year['rating']
plt.plot(xs, ys,color='red', marker='o')
plt.grid(True)
plt.ylim(8, 10)
plt.show()
#set font of all elements to size 12
plt.rc('font', size=12)
plt.figure(figsize=(15,7))
xs=agg_year['year']
ys=agg_year['rating']
plt.plot(xs, ys,color='red', marker='o')
plt.grid(True)
plt.ylim(8, 10)
plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.1f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,10), # distance from text to points (x,y)
ha='center',# horizontal alignment can be left, right or cent
fontsize=12)
plt.savefig(datafolder+"movie_rating_yearly.jpg",dpi=600) #保存图片到本地
plt.show()
3.2 电影评分均值:电影产地(国家或地区)¶
df_country1
country | freq | |
---|---|---|
0 | 美国 | 84 |
1 | 日本 | 22 |
2 | 英国 | 19 |
3 | 中国香港 | 15 |
4 | 中国大陆 | 14 |
5 | 法国 | 12 |
6 | 德国 | 9 |
7 | 韩国 | 9 |
8 | 意大利 | 8 |
9 | 加拿大 | 6 |
10 | 瑞士 | 3 |
11 | 澳大利亚 | 3 |
12 | 新西兰 | 3 |
13 | 中国台湾 | 3 |
14 | 印度 | 2 |
15 | 瑞典 | 2 |
16 | 伊朗 | 1 |
17 | 荷兰 | 1 |
18 | 巴西 | 1 |
19 | 丹麦 | 1 |
20 | 卡塔尔 | 1 |
21 | 西班牙 | 1 |
22 | 波兰 | 1 |
23 | 塞浦路斯 | 1 |
24 | 黎巴嫩 | 1 |
25 | 墨西哥 | 1 |
26 | 奥地利 | 1 |
df_country1['rating']=0.0
for i in range(0,27):
temp=0.0
for j in range(1,151):
if df_country1.country[i] in df1.country[j]:
temp=temp + df1.rating[j]
#print(temp)
df_country1.rating[i] = round(temp / df_country1.freq[i],3)
C:\Users\kaiwu\AppData\Local\Temp\ipykernel_8656\2036557035.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_country1.rating[i] = round(temp / df_country1.freq[i],3)
df_country1
country | freq | rating | |
---|---|---|---|
0 | 美国 | 84 | 9.029 |
1 | 日本 | 22 | 9.018 |
2 | 英国 | 19 | 9.037 |
3 | 中国香港 | 15 | 8.953 |
4 | 中国大陆 | 14 | 9.121 |
5 | 法国 | 12 | 9.050 |
6 | 德国 | 9 | 9.011 |
7 | 韩国 | 9 | 8.956 |
8 | 意大利 | 8 | 9.175 |
9 | 加拿大 | 6 | 9.083 |
10 | 瑞士 | 3 | 9.100 |
11 | 澳大利亚 | 3 | 9.000 |
12 | 新西兰 | 3 | 9.167 |
13 | 中国台湾 | 3 | 9.100 |
14 | 印度 | 2 | 9.100 |
15 | 瑞典 | 2 | 9.000 |
16 | 伊朗 | 1 | 9.200 |
17 | 荷兰 | 1 | 8.900 |
18 | 巴西 | 1 | 8.900 |
19 | 丹麦 | 1 | 9.100 |
20 | 卡塔尔 | 1 | 9.100 |
21 | 西班牙 | 1 | 8.800 |
22 | 波兰 | 1 | 9.200 |
23 | 塞浦路斯 | 1 | 9.100 |
24 | 黎巴嫩 | 1 | 9.100 |
25 | 墨西哥 | 1 | 9.400 |
26 | 奥地利 | 1 | 8.800 |
df_country2 = df_country1.sort_values('rating', ascending=False)
xs =df_country2["country"]
ys = df_country2["rating"]
width1 =0.4
plt.figure(figsize=(20,7))
plt.bar(xs, ys, width=width1,color='#ffb01f')
# html color codes https://html-color.codes/
plt.xticks(rotation=45)
plt.ylim(8, 10)
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,2), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_rating_region1.jpg",dpi=600) #保存图片到本地
plt.show()
3.3 电影评分均值:电影主题¶
df_theme1.shape
(26, 2)
df_theme1['rating']=0.0
for i in range(0,26):
temp=0.0
for j in range(1,151):
if df_theme1.theme[i] in df.theme[j]:
temp=temp + df.rating[j]
#print(temp)
df_theme1.rating[i] = round(temp / df_theme1.freq[i],3)
C:\Users\kaiwu\AppData\Local\Temp\ipykernel_8656\474625504.py:8: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame See the caveats in the documentation: https://pandas.pydata.org/pandas-docs/stable/user_guide/indexing.html#returning-a-view-versus-a-copy df_theme1.rating[i] = round(temp / df_theme1.freq[i],3)
df_theme1
freq | theme | rating | |
---|---|---|---|
剧情 | 114 | 剧情 | 9.058 |
喜剧 | 34 | 喜剧 | 8.965 |
爱情 | 34 | 爱情 | 8.997 |
奇幻 | 33 | 奇幻 | 8.967 |
冒险 | 32 | 冒险 | 8.991 |
动画 | 24 | 动画 | 9.008 |
犯罪 | 21 | 犯罪 | 9.067 |
动作 | 18 | 动作 | 8.967 |
惊悚 | 17 | 惊悚 | 8.894 |
悬疑 | 16 | 悬疑 | 8.931 |
科幻 | 11 | 科幻 | 9.055 |
传记 | 10 | 传记 | 9.030 |
家庭 | 10 | 家庭 | 9.050 |
战争 | 9 | 战争 | 9.133 |
历史 | 6 | 历史 | 9.167 |
音乐 | 5 | 音乐 | 9.160 |
古装 | 5 | 古装 | 8.860 |
歌舞 | 4 | 歌舞 | 9.075 |
同性 | 4 | 同性 | 9.050 |
灾难 | 2 | 灾难 | 9.000 |
西部 | 2 | 西部 | 8.850 |
儿童 | 2 | 儿童 | 8.950 |
纪录片 | 2 | 纪录片 | 9.400 |
武侠 | 2 | 武侠 | 8.700 |
运动 | 1 | 运动 | 9.000 |
情色 | 1 | 情色 | 8.900 |
df_theme2 = df_theme1.sort_values('rating', ascending=False)
xs =df_theme2["theme"]
ys = df_theme2["rating"]
plt.figure(figsize=(20,7))
plt.bar(xs, ys, color='#ffb01f')
plt.ylim(8, 10)
# html color codes https://html-color.codes/
#plt.xticks(rotation=45)
for x,y in zip(xs,ys):
label = "{:.2f}".format(y)
plt.annotate(label, # this is the text
(x,y), # these are the coordinates to position the label
textcoords="offset points", # how to position the text
xytext=(0,3), # distance from text to points (x,y)
ha='center') # horizontal alignment can be left, right or center
plt.savefig(datafolder+"movie_rating_theme.jpg",dpi=600) #保存图片到本地
plt.show()