查看: 456|回复: 8
打印 上一主题 下一主题

[悬赏] Need Help: YeGods.Org Data Extraction

[复制链接]
  • TA的每日心情
    郁闷
    2023-2-15 01:29
  • 签到天数: 566 天

    [LV.9]以坛为家II

    46

    主题

    1086

    回帖

    5万

    积分

    状元

    Rank: 9Rank: 9Rank: 9

    积分
    56254

    QQ 章灌水大神章笑傲江湖章

    跳转到指定楼层
    1
    发表于 2019-12-9 16:14:19 | 只看该作者 回帖奖励 |倒序浏览 |阅读模式
    500
    本帖最后由 nhb42 于 2019-12-9 16:20 编辑

    I want to make a dictionary from this website: https://yegods.org/, but it seems that all the entries are in this directory: https://yegods.org/deity/, and couldn't extract data from here with my little knowledge of data crawling... I made dictionaries from several web encyclopedias, but for this site, I was only able to extract the info pages, including the glossary page. If anyone helps me make this website mdx version, I would be grateful to him/her.

    Here's the mdx version I made from the Glossary page... https://drive.google.com/open?id=1Bd9JRYlrLNqkWBZN9WK7j73xyylbV2_P [I have also added the glossary mdx in the attachment]

    Please, someone, crawl that whole website and I will give more reward ()...

    YeGods.Org Glossary 2019.zip

    2.34 MB, 下载次数: 8, 下载积分: 米 -5 粒

    YeGods.Org Glossary Page

    最佳答案

    查看完整内容

    try to use code below in console of developer tools of Chrome Repeat from a-z => deduplication => get data from https://yegods.org/deity/*
  • TA的每日心情
    开心
    2020-1-12 17:47
  • 签到天数: 5 天

    [LV.2]偶尔看看I

    3

    主题

    37

    回帖

    622

    积分

    举人

    Rank: 4

    积分
    622

    灌水大神章小蜜蜂章笑傲江湖章

    2
    发表于 2019-12-9 16:14:20 | 只看该作者
    本帖最后由 perhapz 于 2019-12-17 14:54 编辑

    try to use code below in console of developer tools of Chrome
    1. fetch("https://yegods.org/search/suggestions/a", {"credentials":"include","headers":{"accept":"*/*","accept-language":"en-US,en;q=0.9","sec-fetch-mode":"cors","sec-fetch-site":"same-origin","x-requested-with":"XMLHttpRequest"},"referrer":"https://yegods.org/","referrerPolicy":"no-referrer-when-downgrade","body":null,"method":"POST","mode":"cors"}).then(v=>v.json()).then(json=>json.matches.map(k=>k.slug));
    复制代码

    Repeat from a-z => deduplication => get data from https://yegods.org/deity/*

    点评

    I don't know what that means... Could you please zip the data and upload it to cloud and send me link?  发表于 2019-12-17 15:58

    评分

    1

    查看全部评分

  • TA的每日心情
    郁闷
    2023-2-15 01:29
  • 签到天数: 566 天

    [LV.9]以坛为家II

    46

    主题

    1086

    回帖

    5万

    积分

    状元

    Rank: 9Rank: 9Rank: 9

    积分
    56254

    QQ 章灌水大神章笑傲江湖章

    3
     楼主| 发表于 2019-12-17 16:20:11 | 只看该作者
    perhapz 发表于 2019-12-17 14:24
    try to use code below in console of developer tools of Chrome

    Repeat from a-z => deduplication => g ...

    wait... after googling I managed to open the script in the console and  I got a list of entries... Now How do I get the data...?
  • TA的每日心情
    开心
    2020-1-12 17:47
  • 签到天数: 5 天

    [LV.2]偶尔看看I

    3

    主题

    37

    回帖

    622

    积分

    举人

    Rank: 4

    积分
    622

    灌水大神章小蜜蜂章笑傲江湖章

    4
    发表于 2019-12-17 17:53:38 | 只看该作者
    本帖最后由 perhapz 于 2019-12-17 17:58 编辑

    Repeat the script from a-z to get all entries
    Remove duplications
    Use some data crawling tool to send GET requests to all entries: https://yegods.org/deity/entry-name
    Get whatever you need from response

    点评

    thanks a lot... you have been rewarded + given 500 extra...  发表于 2019-12-17 18:01

    评分

    1

    查看全部评分

  • TA的每日心情
    郁闷
    2023-2-15 01:29
  • 签到天数: 566 天

    [LV.9]以坛为家II

    46

    主题

    1086

    回帖

    5万

    积分

    状元

    Rank: 9Rank: 9Rank: 9

    积分
    56254

    QQ 章灌水大神章笑傲江湖章

    5
     楼主| 发表于 2019-12-17 17:57:14 | 只看该作者
    perhapz 发表于 2019-12-17 14:24
    try to use code below in console of developer tools of Chrome

    Repeat from a-z => deduplication => g ...

    thank you very much, I've extracted the data with IDM using the list...
  • TA的每日心情
    开心
    2020-1-12 17:47
  • 签到天数: 5 天

    [LV.2]偶尔看看I

    3

    主题

    37

    回帖

    622

    积分

    举人

    Rank: 4

    积分
    622

    灌水大神章小蜜蜂章笑傲江湖章

    6
    发表于 2019-12-17 18:02:55 | 只看该作者
    nhb42 发表于 2019-12-17 17:57
    thank you very much, I've extracted the data with IDM using the list...

    welcome, thanks for the reward

    点评

    you are most welcome, dear...  发表于 2019-12-17 18:04