Presentation Type

Article

Location

Kennesaw, Georgia

Start Date

1-4-2026 9:00 AM

End Date

1-4-2026 10:15 AM

Description

Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable or no internet connectivity. At the same time, the rapid growth of artificial intelligence (AI) and large language models (LLMs) has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and deployable systems. This work addresses (i) generating a structured insect information dataset to be used for LLM training, and (ii) adapting a lightweight LLM model (<= 7B) by fine tuning it for possible future edge device uses in agricultural pest management. The textual data collection was done by reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species. These structured reports were then reviewed and validated by a domain expert. From these reports, we constructed question-answer (Q/A) pairs to support model training and evaluation. A LoRA-based fine-tuning approach was applied to multiple lightweight LLMs and evaluated. Initial evaluation shows that Mistral 7B achieves an 88.9% pass rate on the domain-specific Q/A task, substantially outperforming Qwen 2.5 7B (63.9%), and LLaMA 3.1 8B (58.7%). Notably, Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity to reference text in specialized domains. By combining expert organized data, well structured Q/A pairs, semantic quality control, and efficient model adaptation, this work contributes towards providing support for farmer facing agricultural decision support tools and demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance.

Share

COinS
 
Apr 1st, 9:00 AM Apr 1st, 10:15 AM

AgriPestDatabase-v1.0: A Structured Insect Dataset for Training Agricultural Large Language Model

Kennesaw, Georgia

Agricultural pest management increasingly relies on timely and accurate access to expert knowledge, yet high quality labeled data and continuous expert support remain limited, particularly for farmers operating in rural regions with unstable or no internet connectivity. At the same time, the rapid growth of artificial intelligence (AI) and large language models (LLMs) has created new opportunities to deliver practical decision support tools directly to end users in agriculture through compact and deployable systems. This work addresses (i) generating a structured insect information dataset to be used for LLM training, and (ii) adapting a lightweight LLM model (<= 7B) by fine tuning it for possible future edge device uses in agricultural pest management. The textual data collection was done by reviewing and collecting information from available pest databases and published manuscripts on nine selected pest species. These structured reports were then reviewed and validated by a domain expert. From these reports, we constructed question-answer (Q/A) pairs to support model training and evaluation. A LoRA-based fine-tuning approach was applied to multiple lightweight LLMs and evaluated. Initial evaluation shows that Mistral 7B achieves an 88.9% pass rate on the domain-specific Q/A task, substantially outperforming Qwen 2.5 7B (63.9%), and LLaMA 3.1 8B (58.7%). Notably, Mistral demonstrates higher semantic alignment (embedding similarity: 0.865) despite lower lexical overlap (BLEU: 0.097), indicating that semantic understanding and robust reasoning are more predictive of task success than surface-level conformity to reference text in specialized domains. By combining expert organized data, well structured Q/A pairs, semantic quality control, and efficient model adaptation, this work contributes towards providing support for farmer facing agricultural decision support tools and demonstrates the feasibility of deploying compact, high-performing language models for practical field-level pest management guidance.