๐Ÿžํ”„๋กœ๊ทธ๋ž˜๋ฐ

[์ธ๊ณต์ง€๋Šฅ] ์ผ์ž๋ณ„ ๊ณตํฌํƒ์š•์ง€์ˆ˜ ๋ฐ ๊ฒŒ์‹œ๊ธ€ ์ œ๋ชฉ ํ•™์Šต ํ›„๊ธฐ

TwoIceFish 2024. 8. 4. 22:14

 

ํ›„๊ธฐ

CPU ๋Œ€๋น„ GPU๋Š” ์ง„์งœ 100๋ฐฐ๋Š” ์ฒด๊ฐ์ƒ ๋น ๋ฅธ ์†๋„๋กœ ํ•™์Šต ์ฒ˜๋ฆฌ๊ฐ€ ๋œ๋‹ค.

๋ฉ”๋ชจ๋ฆฌ์— ๋ฐ์ดํ„ฐ๋ฅผ ์˜ฌ๋ ค์„œ ์ง„ํ–‰ํ•˜๋Š” ์ค‘์ธ๋ฐ GPU๋Š” ์ˆœ๊ฐ„ ์ˆœ๊ฐ„ ๋งŽ์ด์‚ฌ์šฉ๋œ๋‹ค.

 

CNN ๊ณตํฌํƒ์š•์ง€์ˆ˜๋ฅผ ์ผ์ž๋ณ„๋กœ ์„ธํŒ…ํ•˜์—ฌ ๊ทธ ๊ธฐ๊ฐ„์— ๋ฐœ์ƒํ•œ ๊ฒŒ์‹œ๋ฌผ์˜ ์ œ๋ชฉ์„ ํ•ด๋‹น ๊ฐ€์ค‘์น˜๋กœ ์ ์šฉํ•˜์˜€๋‹ค. ์ฆ‰ ํŠน์ • ๋ฌธ์žฅ์ด ๋“ฑ์žฅํ•˜๋ฉด ๊ณตํฌ์žฅ์ด๋ผ๋˜๊ฐ€ ํƒ์š•์žฅ์—์„œ ์ž์ฃผ ๋ณด์˜€๋˜ ๊ฒŒ์‹œ๊ธ€๋กœ ๋ณด์•„ ํ˜ธ๋“ค๊ฐ‘ ๋˜๋Š” ๋‹ค์Œ์žฅ์„ ์˜ˆ์ธกํ•˜๋Š” ์ธ๊ฐ„ ์ง€ํ‘œ๋กœ ๋งŒ๋“ค ์ˆ˜ ์žˆ์„ ๊ฒƒ์ด๋‹ค.

 

 

์ „์ฒด ๋ฐ์ดํ„ฐ ํ•™์Šต์ธ Epoch๋Š” 50ํšŒ๋กœ ์„ค์ •ํ•˜์˜€๊ณ  ์ •ํ™•๋„๋Š” ์ ์  ํ–ฅ์ƒ๋˜๋Š” ์–‘์ƒ์„ ๋ณด์ธ๋‹ค. ์ด ํ•™์Šต ์‹œํ‚จ ๋ฐ์ดํ„ฐ๋Š” 5๋งŒ๊ฑด์œผ๋กœ ๋„ค์ด๋ฒ„์˜ ๋ฌดํ•œ์Šคํฌ๋กค ์‹œ ์‚ฌ์šฉ๋˜๋Š” API๋ฅผ ํ™œ์šฉํ–ˆ๋‹ค.

 

ํ•œ๋ฒˆ ํ˜ธ์ถœ ์‹œ 50๊ฐœ์˜ ๋ฐ์ดํ„ฐ๋ฅผ ํ™•๋ณดํ•  ์ˆ˜ ์žˆ์—ˆ์œผ๋ฉฐ ์ตœ๋Œ€ 1000ํŽ˜์ด์ง€๊นŒ์ง€ ์กฐํšŒ ์‚ฌ์ดํด์ด ๋„๋Š” ๊ฒƒ์œผ๋กœ ๋ณด์•„์„œ ์นดํŽ˜ ํ•˜๋‚˜๋‹น 5๋งŒ๊ฑด ํš๋“์ด ๊ฐ€๋Šฅํ•˜๋‹ค.

 

์‹ค์ œ ๋ฐ์ดํ„ฐ๋Š” ์•„๋ž˜์˜ ์‚ฌ์ดํŠธ์—์„œ ํ™•์ธํ•  ์ˆ˜ ์žˆ๋‹ค.

 

https://www.mezoo.me/

 

๋ฏธ์ฃผ๋ฏธ ์ง€ํ‘œ - mezoo.me

์ธ๊ฐ„์ง€ํ‘œ๋ฅผ ์ถ”์ข…ํ•ฉ๋‹ˆ๋‹ค.

www.mezoo.me

 

 

ํ•™์Šต ์†Œ์Šค์ฝ”๋“œ

 

์›๊ฒฉ์ง€ ๋ฐ์ดํ„ฐ๋ฅผ ๊ฐ€์ ธ์™€์„œ ๊ธฐ์กด ๋ชจ๋ธ์— ํ•™์Šต์‹œํ‚ค๋ฉฐ loss 3ํšŒ ๋‹ฌ์„ฑ ์‹œ ํ•™์Šต ์ข…๋ฃŒ

req.txt
0.00MB

import joblib
import pandas as pd
import requests
import torch
import torch.nn as nn
import torch.optim as optim
from sklearn.feature_extraction.text import CountVectorizer
from sklearn.model_selection import train_test_split
from torch.utils.data import Dataset, DataLoader


# 1. ๋ฐ์ดํ„ฐ ์ค€๋น„
def fetch_data(page):
    url = f"http://localhost:3000/api/v1/parse/{page}"
    response = requests.get(url)
    result = response.json()
    return pd.DataFrame(result['result'])


def load_data(pages):
    all_data = pd.DataFrame()
    for page in pages:
        df = fetch_data(page)
        all_data = pd.concat([all_data, df], ignore_index=True)
    return all_data


# ํŽ˜์ด์ง€ ๋ฒ”์œ„ ์„ค์ •
pages = range(1, 996)  # 1๋ถ€ํ„ฐ 995๊นŒ์ง€ ํŽ˜์ด์ง€
df = load_data(pages)

# 2. ๋ฐ์ดํ„ฐ ์ „์ฒ˜๋ฆฌ
vectorizer = CountVectorizer()
X = vectorizer.fit_transform(df['title']).toarray()
y = df['fear_greed_index'].values

# CountVectorizer ์ €์žฅ
joblib.dump(vectorizer, 'vectorizer.pkl')

X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)


class FearGreedDataset(Dataset):
    def __init__(self, X, y):
        self.X = X
        self.y = y

    def __len__(self):
        return len(self.y)

    def __getitem__(self, idx):
        return torch.tensor(self.X[idx], dtype=torch.float32), torch.tensor(self.y[idx], dtype=torch.float32)


train_dataset = FearGreedDataset(X_train, y_train)
test_dataset = FearGreedDataset(X_test, y_test)
train_loader = DataLoader(train_dataset, batch_size=32, shuffle=True)
test_loader = DataLoader(test_dataset, batch_size=32, shuffle=False)


# 3. ๋ชจ๋ธ ์ •์˜
class FearGreedModel(nn.Module):
    def __init__(self, input_dim):
        super(FearGreedModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x


# ๋””๋ฐ”์ด์Šค ์„ค์ •
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


# ๋ชจ๋ธ ์ดˆ๊ธฐํ™”
def initialize_model(model_path=None):
    input_dim = X_train.shape[1]  # ํ˜„์žฌ ๋ฐ์ดํ„ฐ์˜ ์ž…๋ ฅ ์ฐจ์›
    model = FearGreedModel(input_dim).to(device)  # ๋ชจ๋ธ์„ ๋””๋ฐ”์ด์Šค๋กœ ์ด๋™
    if model_path:
        try:
            model.load_state_dict(torch.load(model_path, map_location=device))  # map_location์„ ์‚ฌ์šฉํ•˜์—ฌ ๋””๋ฐ”์ด์Šค ์„ค์ •
            print('Model loaded from', model_path)
        except RuntimeError as e:
            print('Error loading model:', e)
            # ๋ชจ๋ธ ๋กœ๋“œ ์‹คํŒจ ์‹œ ์ดˆ๊ธฐํ™”
            model = FearGreedModel(input_dim).to(device)
    return model


# ์‹ ๊ทœํ•™์Šต
# model = initialize_model()
# ์žฌํ•™์Šต
model = initialize_model('fear_greed_model_gpu.pth')


# 4. ๋ชจ๋ธ ํ•™์Šต
def train_model():
    criterion = nn.MSELoss()
    optimizer = optim.Adam(model.parameters(), lr=0.001)

    loss2 = 0
    epoch = 0
    while True:  # Infinite loop
        model.train()
        for X_batch, y_batch in train_loader:
            X_batch, y_batch = X_batch.to(device), y_batch.to(device)  # ๋ฐฐ์น˜๋ฅผ ๋””๋ฐ”์ด์Šค๋กœ ์ด๋™
            optimizer.zero_grad()
            outputs = model(X_batch)
            loss = criterion(outputs.squeeze(), y_batch)
            loss.backward()
            optimizer.step()

        print(f'Epoch {epoch + 1}, Loss: {loss.item()}')
        epoch += 1
        if (loss.item() < 2.0):
            print('Early stopping')
            loss2 += 1
            if (loss2 == 3):
                break


train_model()  # ํ•„์š”ํ•œ ์—ํญ ์ˆ˜๋ฅผ ์„ค์ •ํ•ฉ๋‹ˆ๋‹ค.

# ๋ชจ๋ธ ์ €์žฅ
torch.save(model.state_dict(), 'fear_greed_model_gpu.pth')
print('Model saved.')

# 5. ์˜ˆ์ธก
model.eval()
with torch.no_grad():
    test_loss = 0
    for X_batch, y_batch in test_loader:
        X_batch, y_batch = X_batch.to(device), y_batch.to(device)  # ๋ฐฐ์น˜๋ฅผ ๋””๋ฐ”์ด์Šค๋กœ ์ด๋™
        outputs = model(X_batch)
        loss = nn.MSELoss()(outputs.squeeze(), y_batch)  # criterion์„ ์—ฌ๊ธฐ์„œ ์ง์ ‘ ์ •์˜
        test_loss += loss.item()

    print(f'Test Loss: {test_loss / len(test_loader)}')

# ์ƒˆ๋กœ์šด ์ œ๋ชฉ์— ๋Œ€ํ•œ ์˜ˆ์ธก
new_titles = ['ํญ๋ฝ์žฅ ์ด๋„ค์š”']
new_X = vectorizer.transform(new_titles).toarray()
new_X_tensor = torch.tensor(new_X, dtype=torch.float32).to(device)  # ์ž…๋ ฅ ํ…์„œ๋ฅผ ๋””๋ฐ”์ด์Šค๋กœ ์ด๋™

model.eval()
with torch.no_grad():
    prediction = model(new_X_tensor).item()
    print(f'Predicted Fear-Greed Index: {prediction}')

 

API ์„œ๋ฒ„

์‹คํ–‰ - uvicorn.exe main:app --reload

import joblib
import torch
import torch.nn as nn
from fastapi import FastAPI, HTTPException
from pydantic import BaseModel

# FastAPI ์ธ์Šคํ„ด์Šค ์ƒ์„ฑ
app = FastAPI()


# ๋ชจ๋ธ ์ •์˜
class FearGreedModel(nn.Module):
    def __init__(self, input_dim):
        super(FearGreedModel, self).__init__()
        self.fc1 = nn.Linear(input_dim, 128)
        self.fc2 = nn.Linear(128, 64)
        self.fc3 = nn.Linear(64, 1)
        self.relu = nn.ReLU()

    def forward(self, x):
        x = self.relu(self.fc1(x))
        x = self.relu(self.fc2(x))
        x = self.fc3(x)
        return x


# ๋””๋ฐ”์ด์Šค ์„ค์ •
device = torch.device('cuda' if torch.cuda.is_available() else 'cpu')


# ๋ชจ๋ธ ๋ฐ CountVectorizer ๋กœ๋“œ
def initialize_model(input_dim, model_path):
    model = FearGreedModel(input_dim).to(device)
    model.load_state_dict(torch.load(model_path, map_location=device))
    model.eval()  # ํ‰๊ฐ€ ๋ชจ๋“œ๋กœ ์ „ํ™˜
    return model


# ์ €์žฅ๋œ CountVectorizer ๋กœ๋“œ
vectorizer = joblib.load('vectorizer.pkl')

# ๋ชจ๋ธ ๋ถˆ๋Ÿฌ์˜ค๊ธฐ
model = initialize_model(input_dim=len(vectorizer.get_feature_names_out()), model_path='fear_greed_model_gpu.pth')


# ์š”์ฒญ ๋ชจ๋ธ ์ •์˜
class PredictRequest(BaseModel):
    titles: list


# FastAPI ์—”๋“œํฌ์ธํŠธ ์ •์˜
@app.post("/predict")
def predict(request: PredictRequest):
    try:
        new_X = vectorizer.transform(request.titles).toarray()
        new_X_tensor = torch.tensor(new_X, dtype=torch.float32).to(device)
        with torch.no_grad():
            predictions = model(new_X_tensor).cpu().numpy().flatten()
        return {"predictions": predictions.tolist()}
    except Exception as e:
        raise HTTPException(status_code=500, detail=str(e))

# ๋ชจ๋ธ ํ‰๊ฐ€ ์—”๋“œํฌ์ธํŠธ ์ •์˜

 

ํ…Œ์ŠคํŠธ ์˜ˆ์‹œ

### GET request to example server
POST http://localhost:8000/predict
Content-Type: application/json

{
  "titles": [
    "์šฐ๋ฆฌ๋Š” ๋งํ–ˆ๋‹ค"
  ]
}


###