OkBublewrap

Repeated Payments 본문

개발/SQL

Repeated Payments

옥뽁뽁 2025. 3. 16. 18:39

Repeated Payments

문제

1. transactions 테이블을 사용하여

2. 같은 가맹점에서 동일한 신용카드로 동일한 금액이 10분 이내에 결제 된 경우

3. 반복 결제된 횟수를 계산

 

풀이

select count(distinct t1.transaction_id) as payment_count
from transactions t1
  join transactions t2 
  on t1.merchant_id = t2.merchant_id
  and t1.credit_card_id = t2.credit_card_id
  and t1.amount = t2.amount
where t1.transaction_timestamp + interval '10 minutes' >=  t2.transaction_timestamp
  and t1.transaction_id < t2.transaction_id

16:37

1️⃣ transactions Table ; Self Join - 1 

transaction_id merchant_id credit_card_id amount transaction_timestamp
1 101 1 100 09/25/2022 12:00:00
2 101 1 100 09/25/2022 12:08:00
3 101 1 100 09/25/2022 12:28:00
4 102 2 300 09/25/2022 12:00:00
6 102 2 400 09/25/2022 14:00:00

 

2️⃣ Self Join - 2 

transaction_id merchant_id credit_card_id amount transaction_timestamp transaction_id transaction_timestamp
1 101 1 100 09/25/2022 12:00:00 1 09/25/2022 12:00:00
1 101 1 100 09/25/2022 12:00:00 2 09/25/2022 12:08:00
1 101 1 100 09/25/2022 12:00:00 3 09/25/2022 12:28:00
2 101 1 100 09/25/2022 12:08:00 1 09/25/2022 12:00:00
2 101 1 100 09/25/2022 12:08:00 2 09/25/2022 12:08:00
2 101 1 100 09/25/2022 12:08:00 3 09/25/2022 12:28:00
3 101 1 100 09/25/2022 12:28:00 1 09/25/2022 12:00:00
3 101 1 100 09/25/2022 12:28:00 2 09/25/2022 12:08:00
3 101 1 100 09/25/2022 12:28:00 3 09/25/2022 12:28:00
4 102 2 300 09/25/2022 12:00:00 4 09/25/2022 12:00:00
4 102 2 300 09/25/2022 12:00:00 6 09/25/2022 14:00:00
6 102 2 400 09/25/2022 14:00:00 4 09/25/2022 12:00:00
6 102 2 400 09/25/2022 14:00:00 6 09/25/2022 14:00:00

 

3️⃣ WHERE - 1

transaction_id merchant_id credit_card_id amount transaction_timestamp transaction_id transaction_timestamp
1 101 1 100 09/25/2022 12:00:00 1 09/25/2022 12:00:00
1 101 1 100 09/25/2022 12:00:00 2 09/25/2022 12:08:00
1 101 1 100 09/25/2022 12:00:00 3 09/25/2022 12:28:00
2 101 1 100 09/25/2022 12:08:00 1 09/25/2022 12:00:00
2 101 1 100 09/25/2022 12:08:00 2 09/25/2022 12:08:00
2 101 1 100 09/25/2022 12:08:00 3 09/25/2022 12:28:00
3 101 1 100 09/25/2022 12:28:00 1 09/25/2022 12:00:00
3 101 1 100 09/25/2022 12:28:00 2 09/25/2022 12:08:00
3 101 1 100 09/25/2022 12:28:00 3 09/25/2022 12:28:00
4 102 2 300 09/25/2022 12:00:00 4 09/25/2022 12:00:00
4 102 2 300 09/25/2022 12:00:00 6 09/25/2022 14:00:00
6 102 2 400 09/25/2022 14:00:00 4 09/25/2022 12:00:00
6 102 2 400 09/25/2022 14:00:00 6 09/25/2022 14:00:00

 

4️⃣ WHERE - 2 

transaction_id merchant_id credit_card_id amount transaction_timestamp transaction_id transaction_timestamp
1 101 1 100 09/25/2022 12:00:00 1 09/25/2022 12:00:00
1 101 1 100 09/25/2022 12:00:00 2 09/25/2022 12:08:00
1 101 1 100 09/25/2022 12:00:00 3 09/25/2022 12:28:00
2 101 1 100 09/25/2022 12:08:00 1 09/25/2022 12:00:00
2 101 1 100 09/25/2022 12:08:00 2 09/25/2022 12:08:00
2 101 1 100 09/25/2022 12:08:00 3 09/25/2022 12:28:00
3 101 1 100 09/25/2022 12:28:00 1 09/25/2022 12:00:00
3 101 1 100 09/25/2022 12:28:00 2 09/25/2022 12:08:00
3 101 1 100 09/25/2022 12:28:00 3 09/25/2022 12:28:00
4 102 2 300 09/25/2022 12:00:00 4 09/25/2022 12:00:00
4 102 2 300 09/25/2022 12:00:00 6 09/25/2022 14:00:00
6 102 2 400 09/25/2022 14:00:00 4 09/25/2022 12:00:00
6 102 2 400 09/25/2022 14:00:00 6 09/25/2022 14:00:00

 

5️⃣ Select distinct transaction_id

payment_count
1

 

 

최적화, 코드 수정

1. 10분 이내 조건이 올바르게 적용되지 않는다고 한다.

2. 거래 순서 조건이 적절한지 확인 필요

SELECT COUNT(DISTINCT t2.transaction_id) AS payment_count
FROM transactions t1
JOIN transactions t2 
    ON t1.merchant_id = t2.merchant_id
    AND t1.credit_card_id = t2.credit_card_id
    AND t1.amount = t2.amount
    AND t1.transaction_timestamp <= t2.transaction_timestamp
    AND t2.transaction_timestamp <= t1.transaction_timestamp + INTERVAL '10 minutes'
WHERE t1.transaction_id < t2.transaction_id;

1️⃣ t1.transaction_timestamp <= t2.transaction_timestamp AND t2.transaction_timestamp <= t1.transaction_timestamp + INTERVAL '10 minutes'

  • 중복 결제인지 정확하게 판별 가능.
  • 기존 조건(t1.transaction_timestamp + interval '10 minutes' >= t2.transaction_timestamp)은 t2.transaction_timestamp가 t1.transaction_timestamp보다 작을 수도 있는 경우를 포함하여 오류 발생 가능성이 있음.
  • 올바른 조건을 사용하여 t1보다 t2가 10분 이내에 발생한 경우만 선택.

2️⃣ COUNT(DISTINCT t2.transaction_id)

  • t1.transaction_id는 원래 첫 결제이므로 제외되어야 함.
  • 반복된 결제(t2.transaction_id)만 세야 함 → COUNT(DISTINCT t2.transaction_id).
  • 기존 COUNT(DISTINCT t1.transaction_id)는 첫 결제까지 포함할 가능성이 있음

 

다른 풀이

1️⃣ 동일한 그룹에 속하는 거래(merchant, credit_card, amount)를 찾아 정렬

2️⃣ 연속된 거래 간의 시간 차이를 계산

3️⃣ 10분 이내에 발생한 중복 거래를 필터링하여 개수 세기

 

1️⃣ 동일한 그룹의 거래 찾기

SELECT 
  transaction_id,
  merchant_id, 
  credit_card_id, 
  amount, 
  transaction_timestamp,
  LAG(transaction_timestamp) OVER (
    PARTITION BY merchant_id, credit_card_id, amount 
    ORDER BY transaction_timestamp
  ) AS previous_transaction 
FROM transactions;

 

Lag를 썻구나, 처음에 접근을 할 때 Lag을 쓰면 10분 내에 여러번 결제가 나올 경우 문제가 생겨서 Self join을 진행을 했는데

중복 거래는 1번 일어난다고 해서 이렇게 구성을 한 것 같다.

 

2️⃣ 시간 차이 계산

SELECT 
  merchant_id, 
  credit_card_id, 
  amount,
  transaction_timestamp,
  EXTRACT(EPOCH FROM transaction_timestamp - 
    LAG(transaction_timestamp) OVER(
      PARTITION BY merchant_id, credit_card_id, amount 
      ORDER BY transaction_timestamp)
  )/60 AS minute_difference 
FROM transactions;

현재 결제 시간이랑 이전 결제 시간이랑 차이를 구해 EPOCH(초단위)로 변환

그후 60을 나눠서 분 단위로 만들었다.

이러면 자연스럽게 첫번째 구매 기록은 NULL로 제외로 시킨다.

 

3️⃣ 10분 이내 중복 결제 필터링 및 개수 세기

WITH payments AS (
  SELECT 
    merchant_id, 
    EXTRACT(EPOCH FROM transaction_timestamp - 
      LAG(transaction_timestamp) OVER(
        PARTITION BY merchant_id, credit_card_id, amount 
        ORDER BY transaction_timestamp)
    )/60 AS minute_difference 
  FROM transactions) 

SELECT COUNT(merchant_id) AS payment_count
FROM payments 
WHERE minute_difference <= 10;

 

계산된 minute_difference가 10분 이내이면 가져오고 merchant_id를 구한다.

'개발 > SQL' 카테고리의 다른 글

Maximize Prime Item Inventory  (0) 2025.03.18
Server Utilization Time  (0) 2025.03.17
날짜 함수, 쿼리 최적화  (0) 2025.03.16
코호트 분석, 세그먼트 분석  (0) 2025.03.16
Histogram of Users and Purchases  (0) 2025.03.16