Select Language

AI社区

公开数据集

实践使大师:电影收藏分析

实践使大师:电影收藏分析

56.36M
172 浏览
0 喜欢
0 次下载
0 条讨论
Business,Arts and Entertainment,Movies and TV Shows,Classification,Data Visualization,Time Series Analysis Classification

数据结构 ? 56.36M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    Context The data set represents movies which were released in the years of xxx up to 2017. It is kept quite general and does not have any real problem / challenge as a background. The whole data set is meant to practice different types of techniques for a data analyst / data scientist. I′d like also to mention that the Dataset is not fully cleaned. Reasoning is that it shall demonstrate you the real life of being an Analyst / Scientist. Get Data - Prep Data - Analyse Data - Visualize Data - Predict Outcomes of different Use Cases ;-) Content I love watching movies and therefore tried to combine this hobby with my current self studies of becoming a data scientist. Therefore I needed a way to obtain a data set which included information of movies so that I could play around and use my learnings. On the first glance I could see that the data set can be used for Regressions, Classifications or potentially even Deep Learning (such as Image Recognition - Post URLs are given) I did aquire this dataset by using different steps. First I did check the internet for a specific API which I may use to receive movie information. After a short time I got to know omdbapi.com. With the help of this API I was able to fetch information based on the title of the movies. Now I had another problem. I was missing movie titles. The next search had begun. I couldn′t find an API for that but I did see that wikipedia was quite well structured in regards to movie titles. So I did build a scraper to fetch all movie titles from 1990 to 2017. After receiving all the data I could finally start to obtain all movie information of a movie by having the title + year (there might be movies which have the same name). Unfortunately some movie titles have been written differently and so I had a failure rate of 10% for obtaining the movie data. Based on the 10% failed movie titles - I did an Text Analysis and found around 400 000 new Movies / Series. The latest Version should include nearly 200 000 different movies based on the imdbID. Additionally I did clean some of the information such as Genre, Actors and Writer for better analysing. Each of the CSV File can be joined by the **imdbID**. Be aware that some information are missing and declared as *_NOT_GIVEN*. Acknowledgements - Thanks to omdbapi.com for providing such a good API and well structured data. Inspiration The inspiration of this data set came from getting into the practical flow of developing an image recognition application. **Recognize the genre of a movie by the given poster.** By request I could also provide the images of the movies. But for the given Dataset I do have the following questions in my mind: 1. Does the Genre correlate with the given Scoring? 2. Can we see a hype of specific genre over the past years? 3. Do the actors or writer prefer a genre? 4. Do the actors or writer have an impact on the imdb scoring? 5. Do the directors have prefered actors for their movies? 6. Do the directors have prefered writers for their movies? 7. How many movies have been produced by the directors? 8. Is there any relation between the director and the imdb rating? 9. .... many more questions :-)
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 172浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享