docs: remove glossary #2130

TC-MO · 2025-12-05T13:15:50Z

Note

Removes the Academy Glossary and cleans up navigation and references across the docs.

Delete all sources/academy/glossary/** pages (concepts/tools) and related content
Update docusaurus.config.js and sources/academy/sidebars.js to remove Glossary menu items
Add NGINX rewrite to retire /academy/glossary paths and related redirects
Replace internal Glossary links with plain text or external refs (e.g., MDN), and adjust copy accordingly in multiple Academy pages
Minor doc tweaks: update Apify CLI link to /cli/docs/installation, refine AGENTS.md structure/checklist

^{Written by Cursor Bugbot for commit 5607cb6. Configure here.}

remove glossary directory remove glossary from sidebar and 2nd navbar remove mentions of glossary from AGENTS.md

apify-service-account · 2025-12-05T13:23:30Z

Preview for this PR was built for commit 10c1f06b and is ready at https://pr-2130.preview.docs.apify.com!

apify-service-account · 2026-01-13T11:56:06Z

Preview for this PR was built for commit 64e432fc and is ready at https://pr-2130.preview.docs.apify.com!

apify-service-account · 2026-01-13T11:59:07Z

Preview for this PR was built for commit 5607cb69 and is ready at https://pr-2130.preview.docs.apify.com!

cursor

This PR is being reviewed by Cursor Bugbot

Details

Your team is on the Bugbot Free tier. On this plan, Bugbot will review limited PRs each billing cycle for each member of your team.

To receive Bugbot reviews on all of your PRs, visit the Cursor dashboard to activate Pro and start your 14-day free trial.

Comment @cursor review or bugbot run to trigger another review on this PR

nginx.conf

marcel-rbro · 2026-01-15T06:58:37Z

AGENTS.md

 └── academy/           # Educational content
    ├── tutorials/     # Step-by-step guides
    ├── webscraping/   # Web scraping courses
-    └── glossary/      # Terminology and definitions


Minor: Remove the glossary also from .cursor/rules/file-organization.mdc

good point, I forgot about those rules

nginx.conf

apify-service-account · 2026-01-15T10:29:13Z

Preview for this PR was built for commit f066c4e6 and is ready at https://pr-2130.preview.docs.apify.com!

marcel-rbro

Overall, consider adding external links to some of the places where links to glossary were removed. Not necessary for stuff like HTTP headers and CSS, but would be helpful for mentions of tools: Postman, Insomnia, Quick JavaScript Switcher (or whatever was the name)...

marcel-rbro · 2026-01-15T12:28:49Z

sources/academy/platform/expert_scraping_with_apify/index.md

 If you're feeling ambitious, you don't need to have any prior experience with Crawlee to get started with this course; however, at least 5–10 minutes of exposure is recommended. If you haven't yet tried out Crawlee, you can refer to the [Using a scraping framework with Node.js](../../webscraping/scraping_basics_javascript/12_framework.md) lesson of the **Web scraping basics for JavaScript devs** course. To familiarize yourself with the Apify SDK, you can refer to the [Apify Platform](../apify_platform.md) category.

-The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to [this short lesson](../../glossary/tools/apify_cli.md).
+The Apify CLI will play a core role in the running and testing of the Actor you will build, so if you haven't gotten it installed already, please refer to this short lesson.


There's no link now

marcel-rbro · 2026-01-15T12:29:44Z

sources/academy/platform/getting_started/apify_api.md

 :::

-Now, let's move over to our favorite HTTP client (in this lesson we'll use [Insomnia](../../glossary/tools/insomnia.md) in order to prepare and send the request).
+Now, let's move over to our favorite HTTP client (in this lesson we'll use Insomnia in order to prepare and send the request).


Consider adding link to https://insomnia.rest/ or to their docs: https://developer.konghq.com/insomnia/

sources/academy/platform/getting_started/apify_api.md

marcel-rbro · 2026-01-15T12:49:29Z

sources/academy/tutorials/node_js/choosing_the_right_scraper.md

 ## Making the choice {#making-the-choice}

-When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the [Quick JavaScript Switcher](../../glossary/tools/quick_javascript_switcher.md) extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser. You can then check what data is received in response using [Postman](../../glossary/tools/postman.md) or [Insomnia](../../glossary/tools/insomnia.md) or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go.
+When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the Quick JavaScript Switcher extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser. You can then check what data is received in response using Postman or Insomnia or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go.


Link to the extension?

See above comment

I think if we mention something for the first time, we should link. The changes made in sources/academy/platform/getting_started/apify_api.md now fit such approach, but if I'm not missing something, this tutorial mentions the Chrome extension for the first time here?

TC-MO · 2026-01-15T14:17:12Z

sources/academy/webscraping/anti_scraping/techniques/captchas.md


 - Using [proxies](../mitigation/proxies.md)?
- Making the request with the proper [headers](../../../glossary/concepts/http_headers.md) and [cookies](../../../glossary/concepts/http_cookies.md)?
+- Making the request with the proper headers and cookies?


What do you think about adding MDN docs link to each occurence of cookies/headers. I did it a few times but not for all

Depends on target audience. If it's a course for beginners, and we mention cookies or headers for the first time, it makes sense to me to link it. Otherwise I'd consider these terms as something the reader should understand.

remove unnecessary heading anchors add links to docs & external tools

apify-service-account · 2026-01-15T14:21:08Z

Preview for this PR was built for commit 63bc8076 and is ready at https://pr-2130.preview.docs.apify.com!

honzajavorek

I did find a few things, but they're more like opinions or nitpicks and I don't want to hold back delivery with those. Approving, and up to you what you do with my comments 🚀

honzajavorek · 2026-01-16T10:18:50Z

sources/academy/tutorials/node_js/choosing_the_right_scraper.md

 ## Making the choice {#making-the-choice}

-When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the [Quick JavaScript Switcher](../../glossary/tools/quick_javascript_switcher.md) extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser. You can then check what data is received in response using [Postman](../../glossary/tools/postman.md) or [Insomnia](../../glossary/tools/insomnia.md) or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go.
+When choosing which scraper to use, we would suggest first checking whether the website works without JavaScript or not. Probably the easiest way to do so is to use the Quick JavaScript Switcher extension for Chrome. If JavaScript is not needed, or you've spotted some XHR requests in the **Network** tab with the data you need, you probably won't need to use an automated browser. You can then check what data is received in response using Postman or Insomnia or try to send a few requests programmatically. If the data is there and you're not blocked straight away, a request-based scraper is probably the way to go.


I think if we mention something for the first time, we should link. The changes made in sources/academy/platform/getting_started/apify_api.md now fit such approach, but if I'm not missing something, this tutorial mentions the Chrome extension for the first time here?

honzajavorek · 2026-01-16T10:19:52Z

sources/academy/webscraping/anti_scraping/techniques/captchas.md


 - Using [proxies](../mitigation/proxies.md)?
- Making the request with the proper [headers](../../../glossary/concepts/http_headers.md) and [cookies](../../../glossary/concepts/http_cookies.md)?
+- Making the request with the proper headers and cookies?


Depends on target audience. If it's a course for beginners, and we mention cookies or headers for the first time, it makes sense to me to link it. Otherwise I'd consider these terms as something the reader should understand.

honzajavorek · 2026-01-16T10:21:52Z

sources/academy/webscraping/anti_scraping/techniques/firewalls.md


 - Using [proxies](../mitigation/proxies.md).
- Mocking [headers](../../../glossary/concepts/http_headers.md).
+- Mocking headers.


Just an idea: Depending on context, instead of linking to MDN, we could just be more specific so that the reader can search for the term if they need. E.g. instead of vague headers, we could write HTTP headers, without link, and that could be a bit better, even without a link.

(I don't know if this place is the place where this would make sense, but this place is the place where I got this idea, hence I put the comment here.)

honzajavorek · 2026-01-16T10:26:27Z

sources/academy/webscraping/anti_scraping/techniques/geolocation.md

 ## Cookies & headers {#cookies-headers}

-Certain websites might use certain location-specific/language-specific [headers](../../../glossary/concepts/http_headers.md)/[cookies](../../../glossary/concepts/http_cookies.md) to geolocate a user. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom HTTP header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).
+Certain websites might use certain location-specific/language-specific headers/cookies to geolocate a user. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom HTTP header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).


Suggested change

Certain websites might use certain location-specific/language-specific headers/cookies to geolocate a user. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom HTTP header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).

To geolocate a user, websites might use HTTP headers and cookies specific to location or language. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).

Also, cookies are technically also HTTP headers, but whatever 😅

honzajavorek · 2026-01-16T10:27:24Z

sources/academy/webscraping/anti_scraping/index.md

 ### Header checking

-This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific [header](../../glossary/concepts/http_headers.md) sets which they send along with every request. The most commonly known header that helps to detect bots is the `User-Agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `User-Agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers.
+This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific header sets which they send along with every request. The most commonly known header that helps to detect bots is the `User-Agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `User-Agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers.


Suggested change

This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific header sets which they send along with every request. The most commonly known header that helps to detect bots is the `User-Agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `User-Agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers.

This type of bot identification is based on the given fact that humans are accessing web pages through browsers, which have specific HTTP header sets which they send along with every request. The most commonly known header that helps to detect bots is the `User-Agent` header, which holds a value that identifies which browser is being used, and what version it's running. Though `User-Agent` is the most commonly used header for the **Header checking** method, other headers are sometimes used as well. The evaluation is often also run based on the header consistency, and includes a known combination of browser headers.

honzajavorek · 2026-01-16T10:28:54Z

sources/academy/webscraping/api_scraping/graphql_scraping/introspection.md

 :::

-In order to perform introspection on our [target website](https://www.cheddar.com), we need to make a request to their GraphQL API with this introspection query using [Insomnia](../../../glossary/tools/insomnia.md) or another HTTP client that supports GraphQL:
+In order to perform introspection on our [target website](https://www.cheddar.com), we need to make a request to their GraphQL API with this introspection query using Insomnia or another HTTP client that supports GraphQL:


If this is the first mention of Insomnia within the course and it is a somewhat known term, I'd link. If not, I wouldn't link.

honzajavorek · 2026-01-16T10:30:34Z

sources/academy/webscraping/puppeteer_playwright/page/waiting.md

 Puppeteer and Playwright don't sit around waiting for a page (or specific elements) to load though - if we tell it to do something with an element that hasn't been rendered yet, it'll start trying to do it (which will result in nasty errors). We've got to tell it to wait.

-> For a thorough explanation on how dynamic rendering works, give [**Dynamic pages**](../../../glossary/concepts/dynamic_pages.md) a quick readover, and check out the examples.
+> For a thorough explanation on how dynamic rendering works, give **Dynamic pages** a quick readover, and check out the examples.


Suggested change

> For a thorough explanation on how dynamic rendering works, give **Dynamic pages** a quick readover, and check out the examples.

I think the admonition doesn't make sense without the page and should be removed.

honzajavorek · 2026-01-16T10:32:32Z

sources/academy/webscraping/puppeteer_playwright/page/waiting.md

 If we remember properly, after clicking the first result, we want to console log the title of the result's page and save a screenshot into the filesystem. In order to grab a solid screenshot of the loaded page though, we should **wait for navigation** before snapping the image. This can be done with [`page.waitForNavigation()`](https://pptr.dev/#?product=Puppeteer&version=v14.1.0&show=api-pagewaitfornavigationoptions).

-> A navigation is when a new [page load](../../../glossary/concepts/dynamic_pages.md) happens. First, the `domcontentloaded` event is fired, then the `load` event. `page.waitForNavigation()` will wait for the `load` event to fire.
+> A navigation is when a new page load happens. First, the `domcontentloaded` event is fired, then the `load` event. `page.waitForNavigation()` will wait for the `load` event to fire.


I won't be like @TC-MO and I won't put here a comment that in a better version of this world, which we are surely building, these blockquotes could be turned into proper admonitions. I won't do it. But trust me, I'm tempted!

honzajavorek · 2026-01-16T10:34:02Z

sources/academy/webscraping/scraping_basics_legacy/challenge/scraping_amazon.md



-Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up [Proxyman](../../../glossary/tools/proxyman.md) to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers:
+Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up Proxyman to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers:


First time I hear about https://proxyman.com/ (like, in my life). If this is a first mention of the tool in the course, I'd link.

nginx.conf

honzajavorek

I just noticed @marcel-rbro generally points out the same stuff, so I'm changing to Comment and once he approves, it should be Approved.

@marcel-rbro

I just noticed @marcel-rbro generally points out the same stuff, so I'm changing to Comment and once he approves, it should be Approved.

TC-MO added 2 commits December 3, 2025 23:32

docs: remove glossary

d09810a

remove glossary directory remove glossary from sidebar and 2nd navbar remove mentions of glossary from AGENTS.md

updates

63a256a

TC-MO self-assigned this Dec 5, 2025

TC-MO added documentation Improvements or additions to documentation. t-docs Issues owned by technical writing team. labels Dec 5, 2025

update broken link

10c1f06

TC-MO added 2 commits January 13, 2026 12:52

add redirect to main docs page from any glossary page

64e432f

Merge branch 'master' into remove-glossary

5607cb6

TC-MO requested a review from marcel-rbro January 13, 2026 11:54

TC-MO marked this pull request as ready for review January 13, 2026 11:58

TC-MO requested a review from honzajavorek as a code owner January 13, 2026 11:58

cursor bot reviewed Jan 13, 2026

View reviewed changes

nginx.conf Show resolved Hide resolved

marcel-rbro requested changes Jan 15, 2026

View reviewed changes

add missing redirects and remove glossary from cursor rules

f066c4e

TC-MO requested a review from marcel-rbro January 15, 2026 10:25

marcel-rbro requested changes Jan 15, 2026

View reviewed changes

TC-MO commented Jan 15, 2026

View reviewed changes

further updates

63bc807

remove unnecessary heading anchors add links to docs & external tools

honzajavorek previously approved these changes Jan 16, 2026

View reviewed changes

honzajavorek reviewed Jan 16, 2026

View reviewed changes

	Certain websites might use certain location-specific/language-specific headers/cookies to geolocate a user. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom HTTP header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).
	To geolocate a user, websites might use HTTP headers and cookies specific to location or language. Some examples of these headers are `Accept-Language` and `CloudFront-Viewer-Country` (which is a custom header from [CloudFront](https://docs.aws.amazon.com/AmazonCloudFront/latest/DeveloperGuide/adding-cloudfront-headers.html)).



		Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up [Proxyman](../../../glossary/tools/proxyman.md) to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers:
		Great! But wait, where do we go from here? We need to go to the offers page next and scrape each offer, but how can we do that? Let's take a small break from writing the scraper and open up Proxyman to analyze requests which we might be difficult to find in the network tab, then we'll click the button on the product page that loads up all of the product offers:

docs: remove glossary #2130

Are you sure you want to change the base?

docs: remove glossary #2130

Uh oh!

Conversation

TC-MO commented Dec 5, 2025 • edited by cursor bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

apify-service-account commented Dec 5, 2025

Uh oh!

apify-service-account commented Jan 13, 2026

Uh oh!

apify-service-account commented Jan 13, 2026

Uh oh!

cursor bot left a comment

Choose a reason for hiding this comment

This PR is being reviewed by Cursor Bugbot

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

apify-service-account commented Jan 15, 2026

Uh oh!

marcel-rbro left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

apify-service-account commented Jan 15, 2026

Uh oh!

honzajavorek left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

honzajavorek left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

5 participants

TC-MO commented Dec 5, 2025 •

edited by cursor bot

Loading