Hej,
Próbowałem przeanalizować plan zapytań i indeksy jakie mamy pozakładane, ale za bardzo nie wiem jak się do tego zabrać. Najczęściej pracuję na MSSQL, a tutaj mi się postgresql trafił.
Mamy tabelki w postgresql, w których mamy miliony wierszy, dosyć szybki przyrost dzienny (dziesiątki tysięcy urządzeń generuje ruch/operacje).
Szukam rozwiązania jak można by przyspieszyć zapytanie:
client_name AS client_name,
brand_name AS brand_name,
model_name AS model_name,
pos_name AS pos_name,
material_name AS material_name,
foil_size_name AS foil_size_name,
outline_type_name AS outline_type_name,
cuts.plotter_name AS plotter_name,
devices.cu_id AS cu_id,
client_parent.name AS client_parent_name,
SUM(quantity) AS total_cuts,
SUM(CASE WHEN client_status IN (0, 2, 4) OR client_status IS NULL THEN quantity ELSE 0 END) AS success_cuts,
SUM(CASE WHEN client_status = 1 THEN quantity ELSE 0 END) AS fail_cuts,
SUM(CASE WHEN client_status = 3 THEN quantity ELSE 0 END) AS fail_installs
from "pg68126_cutterdev".public."cuts"
left join public."clients" on "cuts"."client_id" = "clients"."id"
left join public."clients" as "client_parent" on "clients"."parent_id" = "client_parent"."id"
left join public."devices" on "cuts"."device_id" = "devices"."id"
where "cuts"."created" between '2022-01-01 00:00:00' and '2022-05-01 23:59:59'
and "quantity" = 1
group by date, client_name, brand_name, model_name, pos_name, material_name, foil_size_name, outline_type_name,
cuts.plotter_name, cu_id, client_parent_name
order by MIN(cuts.created);
Plan zapytania dla powyższego:
Sort (cost=490087.84..492067.63 rows=791918 width=293) (actual time=1614.411..1699.115 rows=661124 loops=1)
Sort Key: (min(cuts.created))
Sort Method: external merge Disk: 126696kB
-> Finalize GroupAggregate (cost=139788.09..273292.21 rows=791918 width=293) (actual time=654.045..1364.967 rows=661124 loops=1)
" Group Key: (date(cuts.created)), cuts.client_name, cuts.brand_name, cuts.model_name, cuts.pos_name, cuts.material_name, cuts.foil_size_name, cuts.outline_type_name, cuts.plotter_name, devices.cu_id, client_parent.name"
-> Gather Merge (cost=139788.09..236995.95 rows=659932 width=293) (actual time=654.038..1063.162 rows=693280 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Partial GroupAggregate (cost=138788.07..159823.40 rows=329966 width=293) (actual time=631.108..784.251 rows=231093 loops=3)
" Group Key: (date(cuts.created)), cuts.client_name, cuts.brand_name, cuts.model_name, cuts.pos_name, cuts.material_name, cuts.foil_size_name, cuts.outline_type_name, cuts.plotter_name, devices.cu_id, client_parent.name"
-> Sort (cost=138788.07..139612.98 rows=329966 width=267) (actual time=631.094..657.615 rows=262495 loops=3)
" Sort Key: (date(cuts.created)), cuts.client_name, cuts.brand_name, cuts.model_name, cuts.pos_name, cuts.material_name, cuts.foil_size_name, cuts.outline_type_name, cuts.plotter_name, devices.cu_id, client_parent.name"
Sort Method: external merge Disk: 46472kB
Worker 0: Sort Method: external merge Disk: 42856kB
Worker 1: Sort Method: external merge Disk: 43168kB
-> Hash Left Join (cost=1084.93..54889.95 rows=329966 width=267) (actual time=3.190..158.377 rows=262495 loops=3)
Hash Cond: (cuts.device_id = devices.id)
-> Hash Left Join (cost=611.94..52725.61 rows=329966 width=250) (actual time=1.392..122.963 rows=262495 loops=3)
Hash Cond: (clients.parent_id = client_parent.id)
-> Hash Left Join (cost=306.19..51552.58 rows=329966 width=230) (actual time=0.940..98.505 rows=262495 loops=3)
Hash Cond: (cuts.client_id = clients.id)
-> Parallel Index Scan using key_created on cuts (cost=0.43..50379.54 rows=329966 width=230) (actual time=0.020..56.162 rows=262495 loops=3)
Index Cond: ((created >= '2022-01-01 00:00:00'::timestamp without time zone) AND (created <= '2022-05-01 23:59:59'::timestamp without time zone))
Filter: (quantity = 1)
Rows Removed by Filter: 225
-> Hash (cost=264.78..264.78 rows=3278 width=8) (actual time=0.896..0.897 rows=3279 loops=3)
Buckets: 4096 Batches: 1 Memory Usage: 161kB
-> Seq Scan on clients (cost=0.00..264.78 rows=3278 width=8) (actual time=0.005..0.707 rows=3279 loops=3)
-> Hash (cost=264.78..264.78 rows=3278 width=28) (actual time=0.444..0.444 rows=3279 loops=3)
Buckets: 4096 Batches: 1 Memory Usage: 230kB
-> Seq Scan on clients client_parent (cost=0.00..264.78 rows=3278 width=28) (actual time=0.001..0.203 rows=3279 loops=3)
-> Hash (cost=306.33..306.33 rows=13333 width=21) (actual time=1.773..1.773 rows=13335 loops=3)
Buckets: 16384 Batches: 1 Memory Usage: 847kB
-> Seq Scan on devices (cost=0.00..306.33 rows=13333 width=21) (actual time=0.004..0.911 rows=13335 loops=3)
Planning Time: 0.281 ms
Execution Time: 1758.698 ms
Zapytanie służy do wygenerowania raportu dla klienta, na żądanie w przeglądarce. Może wybrać dowolny zakres dat. Dane prezentowane w formie wykresu na stronie (php+react).
Nie pobieramy jednocześnie całego query, ponieważ RAM jest skończony, a liczba klientów, która mogłaby wywołać raport spora. Wywołujemy całe zapytanie w pętli w interwałach dniowych, ale i tak. Trwa to długo + rendering raportu po stronie przeglądarki też swoje trwa (pobranie i przerobienie jsona troszkę trwa i waży).
Może ktoś mógłby pokierować jak to sensownie zoptymalizować?
Myślałem o Materialized View, ale wtedy jesteśmy ograniczeni do przetwarzania danych widoku w nocy. Pytanie czy to coś da? Jakoś przekonany nie jestem.