

XQuery imitates XML, which allows you to create nested expressions in a way that is not possible in XSLT. XPath works both ways (for example, you can search for a parent element by a child).

The only drawback of CSS is that it only works in one direction, i.e. CSS is syntactically alike to XPath, but in some cases, CSS locators show faster performance and are more descriptive and concise. CSS selectors are used to find an element of its part (attribute).Using it, you can get the value of an element by its ordinal number in the document, extract its text content or internal code, check for specific elements on the page. To access elements, XPath uses DOM navigation by describing the path to the desired element on the page. XPath is a query language for XML/XHTML document elements.The basic web scraping methods are parsing methods using XPath, CSS selectors, XQuery, Regex, and HTML templates. Extracting specific data from the HTML-page code (searching for analytics systems, checking if there is a micro-markup).Collecting data for marketing research (likes, shares, ratings).Extracting contact information (email addresses, phone numbers, etc.).Extracting descriptions of goods and services, getting the number of goods and pictures in a data sheet.Tracking the prices of goods in online stores.Web scraping is a process of automated data extraction from the site pages following certain rules.

They usually reach out to web scrapers to avoid the routine manual work of parsing and extracting data from the HTML pages. Most technical site audit software collect only the H1 and H2 header content by default, however, if, for instance, you want to collect the H5 headers, you will have to extract them separately. This could be parsing prices in an online store, getting the number of likes, or extracting reviews from resources you’re interested in. It’s not a rare case when a webmaster, marketing expert, or SEO specialist needs to extract data from site pages and display it in a comfortable form for further processing.
