sql 문법 질문드립니다.
-
게시물 수정 , 삭제는 로그인 필요
+-----------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------+------------+----------+---------+-------------------+-----+----+---------+--------------------+-----------+----+----------+-------------------+-----------------+------------------+----------------+--------------+ |hvfhs_license_num|dispatching_base_num|originating_base_num| request_datetime| on_scene_datetime| pickup_datetime| dropoff_datetime|PULocationID|DOLocationID|trip_miles|trip_time|base_passenger_fare|tolls| bcf|sales_tax|congestion_surcharge|airport_fee|tips|driver_pay|shared_request_flag|shared_match_flag|access_a_ride_flag|wav_request_flag|wav_match_flag| +-----------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------+------------+----------+---------+-------------------+-----+----+---------+--------------------+-----------+----+----------+-------------------+-----------------+------------------+----------------+--------------+ | HV0005| B02510| null|2020-03-01 00:00:12| null|2020-03-01 00:03:40|2020-03-01 00:23:39| 81| 159| 8.655| 1199| 24.45| 0.0|0.54| 1.9| 0.0| null| 0.0| 19.65| N| N| N| N| N| | HV0005| B02510| null|2020-03-01 00:22:03| null|2020-03-01 00:28:05|2020-03-01 00:38:57| 168| 119| 3.523| 652| 11.88| 0.0|0.24| 0.85| 0.0| null| 0.0| 9.37| N| N| N| N| N| | HV0003| B02764| B02764|2020-02-29 23:57:45|2020-03-01 00:01:04|2020-03-01 00:03:07|2020-03-01 00:15:04| 137| 209| 4.07| 717| 14.57| 0.0|0.38| 1.38| 2.75| null| 0.0| 16.24| N| Y| | N| N| | HV0003| B02764| B02764|2020-03-01 00:04:06|2020-03-01 00:15:48|2020-03-01 00:18:42|2020-03-01 00:38:42| 209| 80| 4.73| 1200| 13.89| 0.0|0.35| 1.23| 0.75| null| 0.0| 21.76| Y| N| | N| N| | HV0003| B02764| B02764|2020-03-01 00:42:46|2020-03-01 00:43:18|2020-03-01 00:44:24|2020-03-01 00:58:44| 256| 226| 4.03| 860| 20.2| 0.0|0.51| 1.79| 0.0| null| 0.0| 19.64| N| N| | N| N| +-----------------+--------------------+--------------------+-------------------+-------------------+-------------------+-------------------+------------+------------+----------+---------+-------------------+-----+----+---------+--------------------+-----------+----+----------+-------------------+-----------------+------------------+----------------+--------------+
이런 데이터프레임이 있을 때
ss.sql("select pickup_datetime as pickup_date, count(*) as trip_count\
from (select split(pickup_datetime, ' ')[0] as pickup_datetime \
from mobility_dataset)\
group by pickup_datetime\
order by pickup_datetime asc").show()
이 쿼리를 짯고 결과가
+-----------+----------+ |pickup_date|trip_count| +-----------+----------+ | 2020-03-01| 784260| | 2020-03-02| 648990| | 2020-03-03| 697880| | 2020-03-04| 707879| | 2020-03-05| 731165| | 2020-03-06| 872012| | 2020-03-07| 886071| | 2020-03-08| 731222| | 2020-03-09| 628940| | 2020-03-10| 626474| | 2020-03-11| 628601| | 2020-03-12| 643257| | 2020-03-13| 660914| | 2020-03-14| 569397| | 2020-03-15| 448125| | 2020-03-16| 391518| | 2020-03-17| 312298| | 2020-03-18| 269233| | 2020-03-19| 252773| | 2020-03-20| 261900| +-----------+----------+
이렇게 제가 원하는 결과가 나왔는데,
사실 위와 같은 결과를 만드려고 제가 짠 쿼리는 맞는데,
count(*) 부분하고 group by 하는 부분에서 좀 햇갈립니다.
+------------+ |splited_date| +------------+ | 2020-03-01| | 2020-03-02| | 2020-03-03| | 2020-03-04| | 2020-03-05| | 2020-03-06| | 2020-03-07| | 2020-03-08| | 2020-03-09| | 2020-03-10| | 2020-03-11| | 2020-03-12| | 2020-03-13| | 2020-03-14| | 2020-03-15| | 2020-03-16| | 2020-03-17| | 2020-03-18| | 2020-03-19| | 2020-03-20| +------------+
날짜를 groupby 하면 이렇게 모아지게 되는데
count(*) as trip_count
이 쿼리가 날짜들을 groupby 해놓고 count하면 1이 나와야하지 않나? 하는 생각을 하는데,
메모리에 어떻게 저장되길래 연산이 가능한건가요??
from
groupby
having
...
이 순으로 진행되는건 알겠는데
어떻게 돌아가고 있는지를 알고싶습니다.
from
#sql 문법 #sql 문법 정리 pdf #sql 문법 검사 #sql 문법 순서 #sql 문법 종류 #sql 문법 총정리 #sql 문법 예제 #sql 문법 join