Loading…
10th International Congress on Information and Communication Technology in concurrent with ICT Excellence Awards (ICICT 2025) will be held at London, United Kingdom | February 18 - 21 2025.
Friday February 21, 2025 2:00pm - 3:30pm GMT

Authors - Aman Mussa, Madina Mansurova
Abstract - The rapid advancement of neural networks has revolutionized multiple domains, as evidenced by the 2024 Nobel Prizes in Physics and Chemistry, both awarded for contributions to neural networks. Large language models (LLMs), such as ChatGPT, have significantly reshaped AI interactions, gaining unprecedented growth and recognition. However, these models still face substantial challenges with low-resource languages like Kazakh, which accounts for less than 0.1% of online content. The scarcity of training data often results in unstable and inaccurate outputs. To address this issue, we present a novel Kazakh language dataset specifically designed for self-instruct fine-tuning of LLMs, comprising 50,000 diverse instructions from internet sources and textbooks. Using Low-Rank Adaptation (LoRa), a parameter-efficient fine-tuning technique, we successfully fine-tuned the LLaMA 2 model on this dataset. Experimental results demonstrate improvements in the model’s ability to comprehend and generate Kazakh text, despite the absence of established benchmarks. This research underscores the potential of large-scale models to bridge the performance gap in low-resource languages and highlights the importance of curated datasets in advancing AI-driven technologies for underrepresented linguistic communities. Future work will focus on developing robust benchmarking standards to further evaluate and enhance these models.
Paper Presenters
avatar for Aman Mussa

Aman Mussa

Kazakhstan
Friday February 21, 2025 2:00pm - 3:30pm GMT
Virtual Room B London, United Kingdom

Sign up or log in to save this to your schedule, view media, leave feedback and see who's attending!

Share Modal

Share this link via

Or copy link