Select Language

AI社区

公开数据集

计算机上的文件

计算机上的文件

103.49M
240 浏览
0 喜欢
0 次下载
0 条讨论
Business,Computer Science,Software,Programming Classification

数据结构 ? 103.49M

    Data Structure ?

    * 以上分析是由系统提取分析形成的结果,具体实际数据为准。

    README.md

    # Dataset: The files on your computer. Crab is a command line tool for Mac and Windows that scans file data into a SQLite database, so you can run SQL queries over it. e.g. (Win) C:> crab C: ome\path\MyProject or (Mac) $ crab /some/path/MyProject You get a CRAB> prompt where you can enter SQL queries on the data, e.g. Count files by extension SELECT extension, count(*) FROM files GROUP BY extension; e.g. List the 5 biggest directories SELECT parentpath, sum(bytes)/1e9 as GB FROM files GROUP BY parentpath ORDER BY sum(bytes) DESC LIMIT 5; Crab provides a virtual table, fileslines, which exposes file contents to SQL e.g. Count TODO and FIXME entries in any .c files, recursively SELECT fullpath, count(*) FROM fileslines WHERE parentpath like '/Users/GN/HL3/%' and extension = '.c' and (data like '%TODO%' or data like '%FIXME%') GROUP BY fullpath; As well there are functions to run programs or shell commands on any subset of files, or lines within files e.g. (Mac) unzip all the .zip files, recursively SELECT exec('unzip', '-n', fullpath, '-d', '/Users/johnsmith/Target Dir/') FROM files WHERE parentpath like '/Users/johnsmith/Source Dir/%' and extension = '.zip'; (Here -n tells _unzip_ not to overwrite anything, and -d specifies target directory) There is also a function to write query output to file, e.g. (Win) Sort the lines of all the .txt files in a directory and write them to a new file SELECT writeln('C:\Users\SJohnson\dictionary2.txt', data) FROM fileslines WHERE parentpath = 'C:\Users\SJohnson\' and extension = '.txt' ORDER BY data; In place of the interactive prompt you can run queries in batch mode. E.g. Here is a one-liner that returns the full path all the files in the current directory C:> crab -batch -maxdepth 1 . "SELECT fullpath FROM files" Crab SQL can also be used in Windows batch files, or Bash scripts, e.g. for ETL processing. **Crab is free for personal use, $5/mo commercial** See more details here (mac): [http://etia.co.uk/][1] or here (win): [http://etia.co.uk/win/about/][2] An example SQLite database (Mac data) has been uploaded for you to play with. It includes an example files table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files. To scan your own files, and get access to the virtual tables and support functions you have to use the Crab SQLite shell, available for download from this page (Mac): [http://etia.co.uk/download/][3] or this page (Win): [http://etia.co.uk/win/download/][4] # Content FILES TABLE The FILES table contains details of every item scanned, file or directory. All columns are indexed except 'mode' COLUMNS fileid (int) primary key -- files table row number, a unique id for each item name (text) -- item name e.g. 'Hei.ttf' bytes (int) -- item size in bytes e.g. 7502752 depth (int) -- how far scan recursed to find the item, starts at 0 accessed (text) -- datetime item was accessed modified (text) -- datetime item was modified basename (text) -- item name without path or extension, e.g. 'Hei' extension (text) -- item extension including the dot, e.g. '.ttf' type (text) -- item type, 'f' for file or 'd' for directory mode (text) -- further type info and permissions, e.g. 'drwxr-xr-x' parentpath (text) -- absolute path of directory containing the item, e.g. '/Library/Fonts/' fullpath (text) unique -- parentpath of the item concatenated with its name, e.g. '/Library/Fonts/Hei.ttf' PATHS 1) parentpath and fullpath don't support abbreviations such as ~ . or .. They're just strings. 2) Directory paths all have a '/' on the end. FILESLINES TABLE The FILESLINES table is for querying data content of files. It has line number and data columns, with one row for each line of data in each file scanned by Crab. This table isn't available in the example dataset, because it's a virtual table and doesn't physically contain data. COLUMNS linenumber (int) -- line number within file, restarts count from 1 at the first line of each file data (text) -- data content of the files, one entry for each line FILESLINES also duplicates the columns of the FILES table: fileid, name, bytes, depth, accessed, modified, basename, extension, type, mode, parentpath, and fullpath. This way you can restrict which files are searched without having to join tables. # Example Gutenberg data An example SQLite database (Mac data), _database.sqlite_, has been uploaded for you to play with. It includes an example _files_ table for the directory tree you get when downloading the Project Gutenberg corpus, which contains 95k directories and 123k files. You can open it with any SQLite shell, or query it with any SQLite query tools, but the virtual tables such as _fileslines_ and support functions such as EXEC() and WRITELN() only work from the Crab shell that you have to download from etia.co.uk. # Uses * Reporting and analysis of filesystem contents * Finding files and directories * Filesystem operations such moving, copying, deleting, unzipping files * ETL processing [1]: http://etia.co.uk/ [2]: http://etia.co.uk/win/about/ [3]: http://etia.co.uk/download/ [4]: http://etia.co.uk/win/download/
    ×

    帕依提提提温馨提示

    该数据集正在整理中,为您准备了其他渠道,请您使用

    注:部分数据正在处理中,未能直接提供下载,还请大家理解和支持。
    暂无相关内容。
    暂无相关内容。
    • 分享你的想法
    去分享你的想法~~

    全部内容

      欢迎交流分享
      开始分享您的观点和意见,和大家一起交流分享.
    所需积分:0 去赚积分?
    • 240浏览
    • 0下载
    • 0点赞
    • 收藏
    • 分享