公开数据集
数据结构 ? 15.26M
Data Structure ?
* 以上分析是由系统提取分析形成的结果,具体实际数据为准。
README.md
Context
This corpus of syllabi aims to support the [Nimbus Assistant](https://www.github.com/calpoly-csai/api), an AI similar to Siri/Alexa that answers students’ questions.
In the context of syllabi, students may ask questions like:
What textbook does MATH 143 need?
Do I need to buy a new book after MATH 142?
What’s the course website for Anton Kaul’s 143?
What’s Dr. Kaul’s grading policy?
What’s the bare minimum I need to do to pass Kaul’s 143 class?
How do I ace Kaul’s math 143 class?
Content
Data was scraped using [Thruuu, an awesome and easy to use SERP (search engine result pages) scraper](https://app.samuelschmitt.com/).
# Thruuu
* `thruuu.xlsx` - the data exported from Thruuu.
* `thruuu.pdf` - the preliminary analysis exported from Thruuu.
# Notebooks/Process
* `step-1-get-documents-from-sheet-urls.ipynb` - a notebook that **inputs** `thruuu.xlsx` and **outputs** `downloads.tar.gz` along with `downloads.csv`
* `step-2-extract-document-data-with-OCR.ipynb` - a notebook that **inputs** `downloads.tar.gz` along with `downloads.csv` and **outputs** `extracted.csv`
* `step-3-get-simple-logistical-information.ipynb` - a notebook that **inputs** `extracted.csv` and outputs `logistical_info.csv`
# Notebook Outputs
* `downloads.tar.gz` - 100 PDF files (some files are corrupted).
* `downloads.csv` - a table associating search result positions with individual PDF files for a syllabus.
* `extracted.csv` - a table associating each PDF file with the extracted OCR text (also the plain text but OCR is preferred).
* `logistical_info.csv` - a table associating each PDF file with the logistical info (instructor/office/email/etc) that is found through regular expressions.
Acknowledgements
Thank you [Samuel Schmitt](samuelschmitt.com) for making Thruuu!
Inspiration
* What kinds of factoids could you mine from the syllabus text?
* What are common phrases used by Cal Poly professors in their syllabi?
* What are the rarest phrases found in syllabi?
* Can you identify a professor’s writing style from their syllabus?
×
帕依提提提温馨提示
该数据集正在整理中,为您准备了其他渠道,请您使用
注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
暂无相关内容。
暂无相关内容。
- 分享你的想法
去分享你的想法~~
全部内容
欢迎交流分享
开始分享您的观点和意见,和大家一起交流分享.
数据使用声明:
- 1、该数据来自于互联网数据采集或服务商的提供,本平台为用户提供数据集的展示与浏览。
- 2、本平台仅作为数据集的基本信息展示、包括但不限于图像、文本、视频、音频等文件类型。
- 3、数据集基本信息来自数据原地址或数据提供方提供的信息,如数据集描述中有描述差异,请以数据原地址或服务商原地址为准。
- 1、本站中的所有数据集的版权都归属于原数据发布者或数据提供方所有。
- 1、如您需要转载本站数据,请保留原数据地址及相关版权声明。
- 1、如本站中的部分数据涉及侵权展示,请及时联系本站,我们会安排进行数据下线。