🌿 Mon Corpus Collection

A growing, open, and community-driven corpus for the Mon language.
Built with care for language preservation, research, and future technologies.

✨ About This Repository

Mon Corpus Collection is a curated collection of Mon-language text data in Unicode format, created to support:

📚 Linguistic research
🤖 Natural Language Processing (NLP)
🧠 Machine learning & AI experiments
🏛️ Digital humanities & cultural preservation
🌏 Open-access language resources

This repository exists to make Mon language data freely available for anyone who wants to learn, analyze, build, or experiment — no barriers, no gatekeeping.

🧩 What’s Inside

✅ Clean Mon text in Unicode
✅ Ready-to-use for NLP pipelines
✅ Suitable for tokenization, training, and analysis
✅ Expandable and community-friendly structure

Whether you’re:

building a tokenizer 🧱
training a language model 🤓
doing academic research 📖
or preserving Mon language digitally 🌾

—you’re welcome here.

🌱 Philosophy

Language is living.

This project treats the Mon language not as static data, but as a living archive — something to be used, remixed, studied, and carried forward by future generations.

Open data.
Open culture.
Open futures.

🔓 License & Usage

🆓 Free to use
🆓 Free to modify
🆓 Free to redistribute

You may use this corpus for any purpose — academic, commercial, experimental, or personal.

(Attribution is appreciated, but not required 💛)

🤝 Contributing

Contributions are very welcome!

You can help by:

Adding new Mon text sources
Cleaning or normalizing data
Improving documentation
Sharing this resource with others

If you care about Mon language, you belong here.

👥 Contributors

Janakh Pon
Htaw Mon

🌏 Why This Matters

Low-resource languages deserve high-quality digital infrastructure.

By collecting and sharing Mon language data openly, this project helps ensure Mon is:

represented in modern technology
accessible to researchers worldwide
preserved beyond physical archives

💬 Final Note

If you use this corpus in your work, research, or project —
we’d love to hear about it.

Let’s build the future of Mon language together 🌾✨

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
custom		custom
monnews		monnews
telegram_mot_tip_ebook		telegram_mot_tip_ebook
wikipedia		wikipedia
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🌿 Mon Corpus Collection

✨ About This Repository

🧩 What’s Inside

🌱 Philosophy

🔓 License & Usage

🤝 Contributing

👥 Contributors

🌏 Why This Matters

💬 Final Note

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

🌿 Mon Corpus Collection

✨ About This Repository

🧩 What’s Inside

🌱 Philosophy

🔓 License & Usage

🤝 Contributing

👥 Contributors

🌏 Why This Matters

💬 Final Note

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages