BeautifulSoup- Jak dodać kolejny warunek?

Rejestracja:około 6 lat
Ostatnio:prawie 5 lat
Postów:16

Cześć, mam pewien problem z poborem danych. Tworzę pewnie projekt z ML i dopiero teraz natknąłem się możliwość scrapowania stron internetowych. Potrzebuje pobrać ogłoszenie wynajmu mieszkań z serwisu otodom.pl. Znalazłem kod, który pobiera te dane, jednak chciałbym je rozszerzyć o trzy dodatkowe informacje jak: stan wykończenia, rynek, piętro. Niestety mam problem z dodaniem tego warunku.
Źródło z którego korzystam to : źródło.
Kod prezentuje się tak:
init

Kopiuj

from main import OtodomParser

if __name__ == '__main__':
    url = 'https://www.otodom.pl/sprzedaz/mieszkanie/krakow/' \
          '?search%5Bregion_id%5D=6&search%5Bsubregion_id%5D=410&search%5Bcity_id%5D=38&search%5Bdist%5D=25'

    parser = OtodomParser(url, 35)
    parser.parse()

main

Kopiuj

import requests
from bs4 import BeautifulSoup
import io


class OtodomParser:
    def __init__(self, address_string, last_page_number=1, file_name="offers.csv"):
        if type(address_string) is not str:
            raise Exception("Address string can't be empty or not str type.")

        self.actual_page = 1
        self.last_page_number = last_page_number
        self.address_string = '{0}&page={1}'.format(address_string, "{}")
        self.file_name = file_name

    @staticmethod
    def remove_blank_strings(l):
        return [e for e in l if e and str(e).strip()]

    @staticmethod
    def remove_unnecessary_elements(_offer):
        result = []
        for el in _offer:
            new_el = el.strip()

            if el.startswith('Mieszkanie na sprzedaż: Kraków, '):
                new_el = el.replace('Mieszkanie na sprzedaż: Kraków, ', '')

            result.append(new_el)
        return result

    @staticmethod
    def prepare_list_from_offer(_offer):
        return list(_offer.children)[3].text.split('\n')

    def parse(self):
        result_list = ["Opis;Dzielnica;Ilość pokoi;Cena;Powierzchnia;Cena za m2;Stan wykończenia;Rynek;Piętro;Link do ogłoszenia"]
        r = requests.get(self.address_string.format(self.actual_page))
        r.encoding = "utf-8"

        while self.actual_page <= self.last_page_number:
            print("Parsing page {}".format(self.actual_page))
            soup = BeautifulSoup(r.content, 'html.parser')
            offers = soup.find_all('article')

            for offer in offers:
                offer_as_list = self.prepare_list_from_offer(offer)
                offer_as_list = self.remove_blank_strings(offer_as_list)
                offer_as_list = self.remove_unnecessary_elements(offer_as_list)
                offer_as_list.append(offer.find('a').attrs['href'])
                result_list.append(';'.join(offer_as_list[1:]))

            self.actual_page += 1
            r = requests.get(self.address_string.format(self.actual_page))


        with io.open(self.file_name, 'w', encoding='utf-8') as file:
            for el in result_list:
                file.write("{}\n".format(el))

ledi12

2020-06-19 13:46

ledi122020-06-19 13:46

Rejestracja:prawie 6 lat
Ostatnio:2 miesiące
Lokalizacja:Wrocław

Kiedys pisalem duzo scraperow do tego typu portali. Ja bym zrobil to tak:

Kopiuj

class Data_scraper():
    def __init__(self, source, soup):
        self.source = source    
        self.soup = soup
    def get_location(self):
            try:
                location = self.source.xpath("*//div[@class='offer-user__address']//address//p/text()")
                if len(location[0]) == 0:
                    raise Exception
            except:
                try:
                    location = self.source.xpath("*//a[@class='css-12hd9gg']/text()")
                except:
                    location = ""
            finally:
                return location
    def get_space(self):
        try:
            space = self.source.xpath("*//ul[@class='offer-details']//li//span[@class='offer-details__param']//strong/text()")
            if len(space) > 0:
                clear_space = str(space[0]).strip().split(" ")
                space = clear_space[0]
            else:
                raise Exception
        except:
            try:
                space = self.source.xpath("//section[@class='section-overview']//div[@class='css-1ci0qpi']//li[contains(text(), 'Powierzchnia')]//strong/text()")
                clear_space = str(space[0]).strip().split(" ")
                space = clear_space[0]
            except:
                space = ""
        finally:
            return space

self.soup - obiekt bs
self.source - obiekt html (from lxml import html)

Kazda metoda reprezentuje jedna zaleznosc. Rozbicie na metody znaczaco ulatwi debugowanie, tym bardziej ze tego typu portale czesto lubia cos zmieniac. Mozesz sobie sprywatyzowac te metody i potem je wywolac gdzies w innej metodzie, ktora to poskleja do kupy.

Liczba odpowiedzi na stronę

Zarejestruj się i dołącz do największej społeczności programistów w Polsce.

Otrzymaj wsparcie, dziel się wiedzą i rozwijaj swoje umiejętności z najlepszymi.

Utwórz konto

BeautifulSoup- Jak dodać kolejny warunek?

pawlo392

ledi12

React Frontend Developer

Product Engineering Team Lead

PHP + Codeigniter 4 programista, aplikacja do fakturowania

Praca dla programistów

Forum dyskusyjne

Sprawy administracyjne

O nas

Skontaktuj się z nami