2026 Hacktheon Sejong CTF ๋Œ€ํšŒ

0. ๋ฌธ์ œ ๊ฐœ์š”

์ด๋ฒˆ ๋ฌธ์ œ๋Š” Voice Over Challenge๋ผ๋Š” ์ด๋ฆ„์˜ ์Œ์„ฑ ๊ธฐ๋ฐ˜ CTF ๋ฌธ์ œ์˜€๋‹ค.

์ œ๊ณต๋œ ํŒŒ์ผ์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜๋‹ค.

1
2
3
4
5
sample_001.wav
sample_002.wav
sample_003.wav
sample_004.wav
sample_005.wav

์†Œ์Šค์ฝ”๋“œ๋Š” ์ œ๊ณต๋˜์ง€ ์•Š์•˜๊ณ , ๋ฌธ์ œ ํŽ˜์ด์ง€ ์ฃผ์†Œ๋งŒ ์ฃผ์–ด์กŒ๋‹ค.

  • http://3.37.31.209:8000

์ฒ˜์Œ์—๋Š” ์ œ๊ณต๋œ WAV ํŒŒ์ผ ์ค‘ ํ•˜๋‚˜๊ฐ€ ์ •๋‹ต ์Œ์„ฑ์ผ ๊ฒƒ์ด๋ผ๊ณ  ์ƒ๊ฐํ–ˆ์ง€๋งŒ, ์‹ค์ œ ์„œ๋ฒ„์— ์ œ์ถœํ•ด ๋ณด๋‹ˆ ๊ทธ๋ ‡์ง€ ์•Š์•˜๋‹ค.


1. ์ œ๊ณต ํŒŒ์ผ ๋ถ„์„

์••์ถ•์„ ํ•ด์ œํ•œ ๋’ค WAV ํŒŒ์ผ๋“ค์„ ํ™•์ธํ•ด ๋ณด๋‹ˆ ๋ชจ๋‘ ๊ฐ™์€ ํ˜•์‹์ด์—ˆ๋‹ค.

1
2
3
mono
16kHz
16-bit PCM

๊ฐ ์Œ์„ฑ ํŒŒ์ผ์„ ์„œ๋ฒ„์— ์ œ์ถœํ•ด ๋ณธ ๊ฒฐ๊ณผ, ๊ณตํ†ต์ ์œผ๋กœ speaker_similarity๋Š” ๋†’๊ฒŒ ๋‚˜์™”์ง€๋งŒ text_similarity๋Š” ๋‚ฎ๊ฒŒ ๋‚˜์™”๋‹ค.
์˜ˆ๋ฅผ ๋“ค์–ด sample_001.wav๋ฅผ ์ œ์ถœํ–ˆ์„ ๋•Œ ์„œ๋ฒ„๋Š” ํ™”์ž ์œ ์‚ฌ๋„๋ฅผ ์•ฝ 0.957 ์ •๋„๋กœ ํŒ๋‹จํ–ˆ๋‹ค.
ํ•˜์ง€๋งŒ transcript๋Š” target sentence์™€ ๋‹ค๋ฅธ ๋ฌธ์žฅ์œผ๋กœ ์ธ์‹๋˜์—ˆ๊ณ , ๊ฒฐ๊ตญ ํ…์ŠคํŠธ ์œ ์‚ฌ๋„ ์กฐ๊ฑด์„ ๋งŒ์กฑํ•˜์ง€ ๋ชปํ•ด ์‹คํŒจํ–ˆ๋‹ค.
์ด๋ฅผ ํ†ตํ•ด ์ œ๊ณต๋œ ์Œ์„ฑ ํŒŒ์ผ์€ ์ •๋‹ต ์Œ์„ฑ์ด ์•„๋‹ˆ๋ผ, ๋ชฉํ‘œ ํ™”์ž์˜ ๋ชฉ์†Œ๋ฆฌ๋ฅผ ๋‹ด์€ ๋ ˆํผ๋Ÿฐ์Šค ์ƒ˜ํ”Œ์ด๋ผ๋Š” ๊ฒƒ์„ ์•Œ ์ˆ˜ ์žˆ์—ˆ๋‹ค.

์ฆ‰ ๋ฌธ์ œ์˜ ๋ชฉํ‘œ๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์•˜๋‹ค.

  • ์ œ๊ณต๋œ ์ƒ˜ํ”Œ ์Œ์„ฑ์„ ์ด์šฉํ•ด target sentence๋ฅผ ๊ฐ™์€ ํ™”์ž์˜ ๋ชฉ์†Œ๋ฆฌ๋กœ ์ฝ๊ฒŒ ๋งŒ๋“  ๋’ค ์ œ์ถœํ•˜๊ธฐ

2. ์„œ๋ฒ„ API ํ™•์ธ

์›น ํŽ˜์ด์ง€์™€ API๋ฅผ ํ™•์ธํ•ด ๋ณด๋‹ˆ ์ฃผ์š” ์—”๋“œํฌ์ธํŠธ๋Š” ๋‘ ๊ฐœ์˜€๋‹ค.

1
2
GET  /api/challenge
POST /api/verify
  • /api/challenge๋Š” ๋งค๋ฒˆ ์ƒˆ๋กœ์šด ๊ฐ’์„ ๋ฐ˜ํ™˜ํ–ˆ๋‹ค.
{
  "token": "...",
  "target_sentence": "..."
}
  • /api/verify๋Š” ์—…๋กœ๋“œํ•œ WAV ํŒŒ์ผ์„ ๊ฒ€์‚ฌํ•ด ๋‹ค์Œ ๊ฐ’์„ ๋ฐ˜ํ™˜ํ–ˆ๋‹ค.
1
2
3
4
5
speaker_similarity
text_similarity
transcript
success
flag

์ฆ‰ ์„œ๋ฒ„๋Š” ๋‹จ์ˆœํžˆ ์Œ์„ฑ ํŒŒ์ผ ํ•˜๋‚˜๋งŒ ๋ณด๋Š” ๊ฒƒ์ด ์•„๋‹ˆ๋ผ,

  1. ๋ชฉ์†Œ๋ฆฌ๊ฐ€ ๋ชฉํ‘œ ํ™”์ž์™€ ๋น„์Šทํ•œ๊ฐ€?
  2. ๋งํ•œ ๋ฌธ์žฅ์ด target_sentence์™€ ์ผ์น˜ํ•˜๋Š”๊ฐ€?

๋‘ ์กฐ๊ฑด์„ ๋™์‹œ์— ๊ฒ€์‚ฌํ•˜๊ณ  ์žˆ์—ˆ๋‹ค.


3. ๋‹จ์ˆœ TTS ์‹œ๋„

์ฒ˜์Œ์—๋Š” Windows TTS๋กœ target sentence๋ฅผ ์ฝ๊ฒŒ ๋งŒ๋“  ๋’ค ์ œ์ถœํ•ด ๋ณด์•˜๋‹ค.
์ด ๊ฒฝ์šฐ text_similarity๋Š” ์–ด๋А ์ •๋„ ์˜ฌ๋ผ๊ฐ”์ง€๋งŒ, speaker_similarity๊ฐ€ ์•ฝ 0.53 ์ˆ˜์ค€์œผ๋กœ ๋‚ฎ๊ฒŒ ๋‚˜์™€ ์‹คํŒจํ–ˆ๋‹ค.

์ฆ‰ ๋‹จ์ˆœ TTS๋กœ๋Š” ๋ฌธ์ œ๋ฅผ ํ’€ ์ˆ˜ ์—†์—ˆ๋‹ค.

  • text_similarity ์กฐ๊ฑด์€ ๋งŒ์กฑ ๊ฐ€๋Šฅ
  • speaker_similarity ์กฐ๊ฑด์€ ๋งŒ์กฑ ๋ถˆ๊ฐ€

๋”ฐ๋ผ์„œ ํ•„์š”ํ•œ ๊ฒƒ์€ ๋‹จ์ˆœ ์Œ์„ฑ ํ•ฉ์„ฑ์ด ์•„๋‹ˆ๋ผ, ์ œ๊ณต๋œ ์ƒ˜ํ”Œ ์Œ์„ฑ์˜ ํ™”์ƒ‰์„ ๋ณต์ œํ•˜๋Š” zero-shot voice cloning ์ด์—ˆ๋‹ค.


4. Voice Cloning ์ ‘๊ทผ

๊ณต๊ฐœ๋œ Hugging Face voice cloning ๋ฐ๋ชจ๋ฅผ ์ฐพ์•„๋ณธ ๊ฒฐ๊ณผ, ๋‹ค์Œ Space๋ฅผ ์‚ฌ์šฉํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.

  • Kikirilkov/Voice_Cloning

์ด Space๋Š” ๋‹ค์Œ๊ณผ ๊ฐ™์€ ํ˜•ํƒœ๋กœ ๋™์ž‘ํ–ˆ๋‹ค.

1
predict(text, audio, language)

์ฆ‰,

1
2
3
text      โ†’ ์„œ๋ฒ„๊ฐ€ ์š”๊ตฌํ•œ target_sentence
audio     โ†’ ์ œ๊ณต๋œ sample WAV
language  โ†’ en

๋ฅผ ์ž…๋ ฅํ•˜๋ฉด, ๋ ˆํผ๋Ÿฐ์Šค ์Œ์„ฑ๊ณผ ๋น„์Šทํ•œ ๋ชฉ์†Œ๋ฆฌ๋กœ target sentence๋ฅผ ์ฝ์€ WAV ํŒŒ์ผ์„ ์ƒ์„ฑํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.
๋ ˆํผ๋Ÿฐ์Šค๋กœ๋Š” ๊ฐ€์žฅ ๊ธธ๊ณ  ์•ˆ์ •์ ์ธ ์ƒ˜ํ”Œ์ธ sample_005.wav๋ฅผ ์‚ฌ์šฉํ–ˆ๋‹ค.


5. Solver ์ฝ”๋“œ

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
import argparse
from pathlib import Path

import requests

try:
    from gradio_client import Client, handle_file
except ImportError as exc:
    raise SystemExit(
        "Missing dependency: gradio_client\n"
        "Install it with: python -m pip install gradio_client"
    ) from exc


CHALLENGE_URL = "http://3.37.31.209:8000"
DEFAULT_SPACE = "Kikirilkov/Voice_Cloning"


def get_challenge(base_url: str) -> dict:
    response = requests.get(f"{base_url}/api/challenge", timeout=30)
    response.raise_for_status()
    return response.json()


def clone_voice(space_name: str, sample_path: Path, text: str, language: str) -> Path:
    client = Client(space_name)
    output_path = client.predict(
        text,
        handle_file(str(sample_path)),
        language,
        api_name="/predict",
    )
    return Path(output_path)


def submit_audio(base_url: str, token: str, audio_path: Path) -> dict:
    with audio_path.open("rb") as audio_file:
        response = requests.post(
            f"{base_url}/api/verify",
            files={"audio": (audio_path.name, audio_file, "audio/wav")},
            data={"token": token},
            timeout=180,
        )
    response.raise_for_status()
    return response.json()


def main() -> None:
    parser = argparse.ArgumentParser(description="Solve the Voice Over Challenge")
    parser.add_argument(
        "--sample",
        default=str(Path(__file__).with_name("sample_005.wav")),
        help="Reference WAV file for the target speaker",
    )
    parser.add_argument(
        "--space",
        default=DEFAULT_SPACE,
        help="Public Hugging Face Space used for voice cloning",
    )
    parser.add_argument(
        "--language",
        default="en",
        help="Language parameter for the voice cloning API",
    )
    parser.add_argument(
        "--base-url",
        default=CHALLENGE_URL,
        help="Challenge base URL",
    )
    args = parser.parse_args()

    sample_path = Path(args.sample).resolve()
    if not sample_path.exists():
        raise SystemExit(f"Sample file not found: {sample_path}")

    print(f"[+] Reference sample : {sample_path}")
    print(f"[+] Voice clone space: {args.space}")

    challenge = get_challenge(args.base_url)
    print(f"[+] Token           : {challenge['token']}")
    print(f"[+] Target sentence : {challenge['target_sentence']}")

    cloned_audio = clone_voice(
        args.space,
        sample_path,
        challenge["target_sentence"],
        args.language,
    )
    print(f"[+] Cloned audio    : {cloned_audio}")

    result = submit_audio(args.base_url, challenge["token"], cloned_audio)
    print(f"[+] Speaker score   : {result['speaker_similarity']}")
    print(f"[+] Text score      : {result['text_similarity']}")
    print(f"[+] Transcript      : {result['transcript']}")

    if result.get("success") and result.get("flag"):
        print(f"[+] FLAG            : {result['flag']}")
    else:
        print("[-] Solve failed")
        print(result)


if __name__ == "__main__":
    main()

6. ์‹คํ–‰ ๊ฒฐ๊ณผ

์ƒ์„ฑ๋œ ์Œ์„ฑ์„ /api/verify์— ์ œ์ถœํ•˜์ž ์„œ๋ฒ„๊ฐ€ target sentence๋ฅผ ์ •ํ™•ํžˆ ์ธ์‹ํ–ˆ๋‹ค.

์„ฑ๊ณต ์‘๋‹ต์€ ๋‹ค์Œ๊ณผ ๊ฐ™์•˜๋‹ค.

1
2
3
4
5
6
7
{
  "text_similarity": 1.0,
  "text_threshold": 0.8,
  "speaker_similarity": 0.8727,
  "success": true,
  "flag": "hacktheon2026{b7d30e21e4106a6ca4d451a218f15a97}"
}

text_similarity๋Š” 1.0์œผ๋กœ ์™„์ „ํžˆ ์ผ์น˜ํ–ˆ๊ณ , speaker_similarity๋„ ์ž„๊ณ„๊ฐ’์ธ 0.8์„ ๋„˜์–ด ์„ฑ๊ณตํ–ˆ๋‹ค.


7. ์ตœ์ข… ํ”Œ๋ž˜๊ทธ

1
hacktheon2026{b7d30e21e4106a6ca4d451a218f15a97}

8. ์ •๋ฆฌ

์ด ๋ฌธ์ œ๋Š” ์ผ๋ฐ˜์ ์ธ ์›น ์ทจ์•ฝ์ ์ด๋‚˜ API ์šฐํšŒ ๋ฌธ์ œ๊ฐ€ ์•„๋‹ˆ๋ผ, ์„œ๋ฒ„๊ฐ€ ์š”๊ตฌํ•˜๋Š” ๊ฒ€์ฆ ์กฐ๊ฑด์„ ์ •ํ™•ํžˆ ์ดํ•ดํ•˜๋Š” ๊ฒƒ์ด ํ•ต์‹ฌ์ด์—ˆ๋‹ค.
์ฒ˜์Œ์—๋Š” ์ œ๊ณต๋œ WAV ํŒŒ์ผ์„ ๊ทธ๋Œ€๋กœ ์ œ์ถœํ•˜๊ฑฐ๋‚˜ ๋‹จ์ˆœ TTS๋ฅผ ์‚ฌ์šฉํ•˜๋Š” ๋ฐฉํ–ฅ์œผ๋กœ ์ ‘๊ทผํ•  ์ˆ˜ ์žˆ์ง€๋งŒ,
์‹ค์ œ๋กœ๋Š” ๋‘ ์กฐ๊ฑด์„ ๋™์‹œ์— ๋งŒ์กฑํ•ด์•ผ ํ–ˆ๋‹ค.

  • ๋ชฉ์†Œ๋ฆฌ ์œ ์‚ฌ๋„
  • ๋ฌธ์žฅ ์ผ์น˜๋„

๊ฒฐ๊ตญ ์ œ๊ณต๋œ ์ƒ˜ํ”Œ์€ ์ •๋‹ต์ด ์•„๋‹ˆ๋ผ ๋ชฉํ‘œ ํ™”์ž์˜ ๋ ˆํผ๋Ÿฐ์Šค์˜€๊ณ ,
๋ฌธ์ œ์˜ ์˜๋„๋Š” ์ด๋ฅผ ์ด์šฉํ•ด target sentence๋ฅผ ๊ฐ™์€ ๋ชฉ์†Œ๋ฆฌ๋กœ ํ•ฉ์„ฑํ•˜๋Š” ๊ฒƒ์ด์—ˆ๋‹ค.
Voice cloning์„ ํ™œ์šฉํ•ด ๋‘ ์กฐ๊ฑด์„ ๋ชจ๋‘ ๋งŒ์กฑ์‹œํ‚ค๋ฉด์„œ ๋ฌธ์ œ๋ฅผ ํ•ด๊ฒฐํ•  ์ˆ˜ ์žˆ์—ˆ๋‹ค.