Skip to content

MonDevHub/MonCorpusCollection

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🌿 Mon Corpus Collection

A growing, open, and community-driven corpus for the Mon language.
Built with care for language preservation, research, and future technologies.


✨ About This Repository

Mon Corpus Collection is a curated collection of Mon-language text data in Unicode format, created to support:

  • 📚 Linguistic research
  • 🤖 Natural Language Processing (NLP)
  • 🧠 Machine learning & AI experiments
  • 🏛️ Digital humanities & cultural preservation
  • 🌏 Open-access language resources

This repository exists to make Mon language data freely available for anyone who wants to learn, analyze, build, or experiment — no barriers, no gatekeeping.


🧩 What’s Inside

  • ✅ Clean Mon text in Unicode
  • ✅ Ready-to-use for NLP pipelines
  • ✅ Suitable for tokenization, training, and analysis
  • ✅ Expandable and community-friendly structure

Whether you’re:

  • building a tokenizer 🧱
  • training a language model 🤓
  • doing academic research 📖
  • or preserving Mon language digitally 🌾

—you’re welcome here.


🌱 Philosophy

Language is living.

This project treats the Mon language not as static data, but as a living archive — something to be used, remixed, studied, and carried forward by future generations.

Open data.
Open culture.
Open futures.


🔓 License & Usage

🆓 Free to use
🆓 Free to modify
🆓 Free to redistribute

You may use this corpus for any purpose — academic, commercial, experimental, or personal.

(Attribution is appreciated, but not required 💛)


🤝 Contributing

Contributions are very welcome!

You can help by:

  • Adding new Mon text sources
  • Cleaning or normalizing data
  • Improving documentation
  • Sharing this resource with others

If you care about Mon language, you belong here.


👥 Contributors

Janakh Pon
Htaw Mon


🌏 Why This Matters

Low-resource languages deserve high-quality digital infrastructure.

By collecting and sharing Mon language data openly, this project helps ensure Mon is:

  • represented in modern technology
  • accessible to researchers worldwide
  • preserved beyond physical archives

💬 Final Note

If you use this corpus in your work, research, or project —
we’d love to hear about it.

Let’s build the future of Mon language together 🌾✨

About

A corpus collection in the Mon language, in Unicode format, ready for natural language processing and research.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors