公开数据集
数据结构 ? 56.36M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
The data set represents movies which were released in the years of xxx up to 2017. It is kept quite general and does not have any real problem / challenge as a background. The whole data set is meant to practice different types of techniques for a data analyst / data scientist.
I′d like also to mention that the Dataset is not fully cleaned. Reasoning is that it shall demonstrate you the real life of being an Analyst / Scientist.
Get Data - Prep Data - Analyse Data - Visualize Data - Predict Outcomes of different Use Cases ;-)
Content
I love watching movies and therefore tried to combine this hobby with my current self studies of becoming a data scientist.
Therefore I needed a way to obtain a data set which included information of movies so that I could play around and use my learnings. On the first glance I could see that the data set can be used for Regressions, Classifications or potentially even Deep Learning (such as Image Recognition - Post URLs are given)
I did aquire this dataset by using different steps. First I did check the internet for a specific API which I may use to receive movie information. After a short time I got to know omdbapi.com. With the help of this API I was able to fetch information based on the title of the movies.
Now I had another problem. I was missing movie titles. The next search had begun. I couldn′t find an API for that but I did see that wikipedia was quite well structured in regards to movie titles. So I did build a scraper to fetch all movie titles from 1990 to 2017.
After receiving all the data I could finally start to obtain all movie information of a movie by having the title + year (there might be movies which have the same name). Unfortunately some movie titles have been written differently and so I had a failure rate of 10% for obtaining the movie data. Based on the 10% failed movie titles - I did an Text Analysis and found around 400 000 new Movies / Series. The latest Version should include nearly 200 000 different movies based on the imdbID.
Additionally I did clean some of the information such as Genre, Actors and Writer for better analysing. Each of the CSV File can be joined by the **imdbID**. Be aware that some information are missing and declared as *_NOT_GIVEN*.
Acknowledgements
- Thanks to omdbapi.com for providing such a good API and well structured data.
Inspiration
The inspiration of this data set came from getting into the practical flow of developing an image recognition application. **Recognize the genre of a movie by the given poster.**
By request I could also provide the images of the movies. But for the given Dataset I do have the following questions in my mind:
1. Does the Genre correlate with the given Scoring?
2. Can we see a hype of specific genre over the past years?
3. Do the actors or writer prefer a genre?
4. Do the actors or writer have an impact on the imdb scoring?
5. Do the directors have prefered actors for their movies?
6. Do the directors have prefered writers for their movies?
7. How many movies have been produced by the directors?
8. Is there any relation between the director and the imdb rating?
9. .... many more questions :-)
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。