Z punktu widzenia moich dotychczasowych doświadczeń w web-scrappingu, strony używające GraphQL to jakaś chora patologia. Przykład takiego zapytania pobranego przy pomocy developer tools przeglądarki:
curl 'https://www.homedepot.com/product-information/model' -H 'User-Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:86.0) Gecko/20100101 Firefox/86.0' -H 'Accept: /' -H 'Accept-Language: en-US,en;q=0.5' --compressed -H 'content-type: application/json' -H 'X-Experience-Name: general-merchandise' -H 'apollographql-client-name: general-merchandise' -H 'apollographql-client-version: 0.0.0' -H 'X-current-url: /b/Tools-Hand-Tools-Specialty-Hand-Tools/N-5yc1vZc1zi' -H 'X-Cloud-Trace-Context: ca8bdbb0d0ce318275b61f7f6e1bebb6/8353233990978753714' -H 'X-Api-Cookies: {"x-user-id":"9963827e-f19c-c241-bb64-aaab21b60bb4"}' -H 'x-debug: false' -H 'Origin: https://www.homedepot.com' -H 'Connection: keep-alive' -H 'Cookie: _abck=510645CDCC2F3AC238D7C79764DD657D0YAAQN2ReaNVfVhF4AQAA5mdfEwWNiSz1kFq0yAhwCVEeHbbTaelHBaqnZSTI0DJodDgwzBSlCaaWQbMzOf/ddHiWzEzJTnTwuXF21P6TImIQntv135YN2Wfdd0d1zjRjsZHLX++zBWxQrcodtXPIvtadZhBDFJWFG1q3hCorQQZMYhIeOUNFIBDD/h82HM/+f73zNM9Rc706TR1xrTLLEOG1FbRbS9N/aYujZ2dg7OyP7pfcWIppgvRwqDf0EeuRbelv9IE2rF3Mc7NYN8DqCn3QwyDEHOLMKCZTjD0ft/nvlkt6UdiLKvXTfgfZ0szLk3pU3pSQe32Wuvhf5rmqQz33XHYQ3Cpz38fzdg/e2CydcmF3Id83wW67uEN87eio5I5lOf20jBorfDKELhAhVIvzaBywFY9Jt9Qa~-1~-1~-1; AMCV_F6421253512D2C100A490D45%40AdobeOrg=1585540135%7CMCIDTS%7C18695%7CMCMID%7C90275896034488086671674408193850329779%7CMCOPTOUT-1615241592s%7CNONE%7CvVersion%7C4.4.0; mbox=PC#ceb035cf5aa84644a4d37faa83265242.37_0#1678308957|session#00caa33577ba4ca5825902490bb5d790#1615236250; THD_PERSIST=C4%3D2414%2BBangor%20-%20Bangor%2C%20ME%2B%3A%3BC4_EXP%3D1646768178%3A%3BC24%3D04401%3A%3BC24_EXP%3D1646768178%3A%3BC39%3D1%3B8%3A00-20%3A00%3B2%3B6%3A00-21%3A00%3B3%3B6%3A00-21%3A00%3B4%3B6%3A00-21%3A00%3B5%3B6%3A00-21%3A00%3B6%3B6%3A00-21%3A00%3B7%3B6%3A00-21%3A00%3A%3BC39_EXP%3D1615235778; THD_CACHE_NAV_PERSIST=; WORKFLOW=GEO_LOCATION; DELIVERY_ZIP=04401; DELIVERY_ZIP_TYPE=AUTO; forterToken=aa7171f09d1043f0b1d09161babd878c_1615234390091_69_dUAL43-mnts_13ck; thda.u=9963827e-f19c-c241-bb64-aaab21b60bb4; s_pers=%20s_nr365%3D1615234392214-Repeat%7C1646770392214%3B%20s_dslv%3D1615234392221%7C1709842392221%3B; ajs_user_id=null; ajs_group_id=null; ajs_anonymous_id=%22d941f5b7-1f1a-45b2-a270-463fb09cfef3%22; _meta_mediaMath_iframe_counter=5; _meta_bing_beaconFired=true; _meta_facebookPixel_beaconFired=true; _meta_movableInk_mi_u=d941f5b7-1f1a-45b2-a270-463fb09cfef3; _meta_metarouter_timezone_offset=-60; _meta_google_cid=1025861455.1615064160; _meta_google_jid=1026882964; _meta_google_gjid=376323665; _gcl_au=1.1.121942600.1615064160; RES_TRACKINGID=722064508749718; akaau=1615234934id=a303a7d92e11eb2dd52633043c8e419b; HD_DC=origin; bm_sz=822291014C3AB3D20857A06D7C7A44E5YAAQJWReaCYvssl3AQAArg1XEwvO1SxshvxS+mQh/Cyv13czZcc+DxVi9Fudixnb+5I5RSBEBB+s5f8qFmudYG8qPGeSL1sx+FDZCp5UBFfEN46o61Z4tyny3kVhL5ng+kd4GbAQyO5o3tamg4j1vR/ueIJTXYpaXazHNMxamxJvIXhHZJAjfrh9bAjhnFWgoqYw; ak_bmsc=77E4B3C8D9708BCBD5BA97728DEECC18000000000000000000000000000000YAAQN2ReaNdKVhF4AQAAmAxaEwsVQ4uz8xVSKkvhzs8+Y2IIwuWAn/yYd+bmTElf7LOmIWoaoyiHdpPL4cAnxUBwXlh7qsuxLQkUHAGlzzr+Ek63GPhoZGsZfZSOYgwZPjmSMBh3xXOcGZltHQb+Xlo5+XEf25OWwhtOZEbKQxD9tXcQfKiU61phaG2JooxTSnRkKyBYarXrCpDJby9rPQw1sqD3SpNJ1w8g0+yBk2QWI2pTD+yLu44PRdy6LRGW19qPsjJYr3nIz0HNwL/8fxUQFGru26liHS/5mZ4x0IT5QG+a3dzRYqnNbvqXLWCyehHh34PTAGoZyGzKefKa/ARCkqJxHlL3aSA6URb6HTjuqCHK+si+jZbiQbi3IVym5SJBcqZYNNuyUilvk5mw6hAA/LdDLWgAmzV24YJtdhY1S//epV2UP4k+P3UZXtarWnZjqzhw/bNvR2JL8noxHChUT0Jx6D70XDi2vUsS8oTRJkso1L07QCPA; THD_NR=1; check=true; THD_SESSION=; THD_CACHE_NAV_SESSION=; THD_FORCE_LOC=1; AMCVS_F6421253512D2C100A490D45%40AdobeOrg=1; bm_sv=13D69BE6DBC1ADEEE068A7DC35F9340BTHX1lNCCmTSXqn2cHVcorF3W6CpvZfc7NQGK4urV5g9iOfR5xAaA64fT+85N2fpHSmBNf/SEecj4DQn5idEwyNoy0n5XTfkdTI52Rax6DlBWvfKeNfD3TqnKjAeX/0Ng7i2r2XJnoPd5XPhOYT2NwBiQivHkYI3qYsEEfCCa+IQ=; THD_INTERNAL=0; THD_LOCALIZER=%7B%22WORKFLOW%22%3A%22GEO_LOCATION%22%2C%22THD_FORCE_LOC%22%3A%221%22%2C%22THD_INTERNAL%22%3A%220%22%2C%22THD_STRFINDERZIP%22%3A%2204401%22%2C%22THD_LOCSTORE%22%3A%222414%2BBangor%20-%20Bangor%2C%20ME%2B%22%2C%22THD_STORE_HOURS%22%3A%221%3B8%3A00-20%3A00%3B2%3B6%3A00-21%3A00%3B3%3B6%3A00-21%3A00%3B4%3B6%3A00-21%3A00%3B5%3B6%3A00-21%3A00%3B6%3B6%3A00-21%3A00%3B7%3B6%3A00-21%3A00%22%2C%22THD_STORE_HOURS_EXPIRY%22%3A1615235778%7D; bm_mi=933DCDEEC5AAA57B5C94CFAFB2DF9576dcAO0zuOzG6V6eNkfgDKCIPd7M9UjEJTLVxmTYUni3j/s0g7i/kOSpgPjB6H4nIDNsrjFU8uf5PUJi2j+wttnM+ufk0diXMmLnMTf/9d6ohbxq69B9Tb4LjF9CSHCwsivt76KpeITbHU9G0QX2kA2lw+W4Yp0/ZBWmVz0iWlvtZoFlK//NPmZAzRDQTAI24zQ8tFWp90GMJdhudeKvFF5Lbq1ErQ0mbnNW5bGLntvHcCrruWWHsIaLQpcdx6vwNwFcrLmYgLkZnVW9fXCVrOunCoKIXSblpGJmhXlZd/T9Q=; thda.s=1cd0ae03-b34d-bb8d-2753-f7c0e552cdef; s_sess=%20s_pv_pName%3Dtools%253Ehand%2520tools%3B%20s_pv_pType%3Dcategory%3B%20s_pv_cmpgn%3D%3B%20s_pv_pVer%3Dcategory%253Al2%253Aversion%253Agen2%3B%20stsh%3D%3B%20s_cc%3Dtrue%3B; _meta_google_gid=1933054516.1615232188; thda.m=90275896034488086671674408193850329779; s_sq=%5B%5BB%5D%5D; ResonanceSegment=; IN_STORE_API_SESSION=TRUE' -H 'TE: Trailers' --data-raw '{"operationName":"searchModel","variables":{"skipInstallServices":false,"skipSpecificationGroup":false,"storefilter":"ALL","channel":"DESKTOP","additionalSearchParams":{"sponsored":true,"mcvisId":"90275896034488086671674408193850329779","deliveryZip":"04401"},"navParam":"5yc1vZc1zi","orderBy":{"field":"TOP_SELLERS","order":"ASC"},"pageSize":24,"storeId":"2414"},"query":"query searchModel($startIndex: Int, $pageSize: Int, $orderBy: ProductSort, $filter: ProductFilter, $storeId: String, $zipCode: String, $skipInstallServices: Boolean = true, $skipSpecificationGroup: Boolean = false, $keyword: String, $navParam: String, $storefilter: StoreFilter = ALL, $itemIds: [String], $channel: Channel = DESKTOP, $additionalSearchParams: AdditionalParams) {\n searchModel(keyword: $keyword, navParam: $navParam, storefilter: $storefilter, storeId: $storeId, itemIds: $itemIds, channel: $channel, additionalSearchParams: $additionalSearchParams) {\n metadata {\n hasPLPBanner\n categoryID\n analytics {\n semanticTokens\n dynamicLCA\n __typename\n }\n canonicalUrl\n searchRedirect\n clearAllRefinementsURL\n contentType\n cpoData {\n cpoCount\n cpoOnly\n totalCount\n __typename\n }\n isStoreDisplay\n productCount {\n inStore\n __typename\n }\n stores {\n storeId\n storeName\n address {\n postalCode\n __typename\n }\n nearByStores {\n storeId\n storeName\n distance\n address {\n postalCode\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n products(startIndex: $startIndex, pageSize: $pageSize, orderBy: $orderBy, filter: $filter) {\n identifiers {\n storeSkuNumber\n canonicalUrl\n brandName\n itemId\n productLabel\n modelNumber\n productType\n parentId\n isSuperSku\n __typename\n }\n itemId\n dataSources\n media {\n images {\n url\n type\n subType\n sizes\n __typename\n }\n __typename\n }\n pricing(storeId: $storeId) {\n value\n alternatePriceDisplay\n alternate {\n bulk {\n pricePerUnit\n thresholdQuantity\n value\n __typename\n }\n unit {\n caseUnitOfMeasure\n unitsOriginalPrice\n unitsPerCase\n value\n __typename\n }\n __typename\n }\n original\n mapAboveOriginalPrice\n message\n preferredPriceFlag\n promotion {\n type\n description {\n shortDesc\n longDesc\n __typename\n }\n dollarOff\n percentageOff\n savingsCenter\n savingsCenterPromos\n specialBuySavings\n specialBuyDollarOff\n specialBuyPercentageOff\n dates {\n start\n end\n __typename\n }\n __typename\n }\n specialBuy\n unitOfMeasure\n __typename\n }\n reviews {\n ratingsReviews {\n averageRating\n totalReviews\n __typename\n }\n __typename\n }\n availabilityType {\n discontinued\n type\n __typename\n }\n badges(storeId: $storeId) {\n name\n __typename\n }\n details {\n collection {\n collectionId\n name\n url\n __typename\n }\n __typename\n }\n favoriteDetail {\n count\n __typename\n }\n fulfillment(storeId: $storeId, zipCode: $zipCode) {\n backordered\n backorderedShipDate\n seasonStatusEligible\n fulfillmentOptions {\n type\n fulfillable\n services {\n type\n hasFreeShipping\n freeDeliveryThreshold\n locations {\n curbsidePickupFlag\n isBuyInStoreCheckNearBy\n distance\n inventory {\n isOutOfStock\n isInStock\n isLimitedQuantity\n isUnavailable\n quantity\n maxAllowedBopisQty\n minAllowedBopisQty\n __typename\n }\n isAnchor\n storeName\n type\n __typename\n }\n __typename\n }\n __typename\n }\n __typename\n }\n info {\n isBuryProduct\n isSponsored\n isGenericProduct\n isLiveGoodsProduct\n sponsoredBeacon {\n onClickBeacon\n onViewBeacon\n __typename\n }\n sponsoredMetadata {\n campaignId\n placementId\n slotId\n __typename\n }\n globalCustomConfigurator {\n customExperience\n __typename\n }\n returnable\n hidePrice\n productSubType {\n name\n link\n __typename\n }\n categoryHierarchy\n productDepartmentId\n productDepartment\n samplesAvailable\n augmentedReality\n ecoRebate\n quantityLimit\n sskMin\n sskMax\n unitOfMeasureCoverage\n wasMaxPriceRange\n wasMinPriceRange\n swatches {\n isSelected\n itemId\n label\n swatchImgUrl\n url\n value\n __typename\n }\n totalNumberOfOptions\n __typename\n }\n installServices @Skip(if: $skipInstallServices) {\n scheduleAMeasure\n __typename\n }\n keyProductFeatures {\n keyProductFeaturesItems {\n features {\n name\n refinementId\n refinementUrl\n value\n __typename\n }\n __typename\n }\n __typename\n }\n specificationGroup @Skip(if: $skipSpecificationGroup) {\n specifications {\n specName\n specValue\n __typename\n }\n specTitle\n __typename\n }\n sizeAndFitDetail {\n attributeGroups {\n attributes {\n attributeName\n dimensions\n __typename\n }\n dimensionLabel\n productType\n __typename\n }\n __typename\n }\n __typename\n }\n id\n searchReport {\n totalProducts\n didYouMean\n correctedKeyword\n keyword\n pageSize\n searchUrl\n sortBy\n sortOrder\n startIndex\n __typename\n }\n relatedResults {\n universalSearch {\n title\n __typename\n }\n relatedServices {\n label\n __typename\n }\n visualNavs {\n label\n imageId\n webUrl\n categoryId\n imageURL\n __typename\n }\n visualNavContainsEvents\n relatedKeywords {\n keyword\n __typename\n }\n __typename\n }\n taxonomy {\n brandLinkUrl\n breadCrumbs {\n browseUrl\n creativeIconUrl\n deselectUrl\n dimensionId\n dimensionName\n label\n refinementKey\n url\n __typename\n }\n __typename\n }\n templates\n partialTemplates\n dimensions {\n label\n refinements {\n refinementKey\n label\n recordCount\n selected\n imgUrl\n url\n nestedRefinements {\n label\n url\n recordCount\n refinementKey\n __typename\n }\n __typename\n }\n collapse\n dimensionId\n isVisualNav\n nestedRefinementsLimit\n visualNavSequence\n __typename\n }\n orangeGraph {\n universalSearchArray {\n pods {\n title\n description\n imageUrl\n link\n recordType\n __typename\n }\n info {\n title\n __typename\n }\n __typename\n }\n productTypes\n intents\n orderNumber\n __typename\n }\n appliedDimensions {\n label\n refinements {\n label\n refinementKey\n url\n __typename\n }\n __typename\n }\n __typename\n }\n}\n"}'
Piękne, a teraz zreprodukujcie to sobie przy użyciu jakiejś biblioteki do zapytań HTTP jak requests
, albo jako zapytanie w Scrapy
. Dziękuję, to ja już wolę prosty REST z ludzkimi parametrami. Tych jednak jakoś ostatnio jest coraz mniej, a zamiast tego powstaje takie pokraczne API albo odpowiedzi na GET z API zawierające zbity JSON na 20000. Żadna frajda próbować to reprodukować.