PyDODa: DODa Wrapper

pydoda-logo

>>> from pydoda import Category
>>> 
>>> # Create an instance of Category
>>> my_category = Category('semantic', 'animals')
>>> 
>>> # Get the Darija translation of a word
>>> darija_translation = my_category.get_darija_translation('dog')
>>> print(darija_translation)
'klb'
>>> 
>>> # Get the English translation of a word
>>> english_translation = my_category.get_english_translation('mch')
>>> print(english_translation)
'cat'

Introduction

About the library:

Pydoda is a comprehensive Python library that serves as a convenient wrapper for the DODa dataset, offering seamless access and powerful analysis capabilities. The DODa dataset is a valuable linguistic resource that contains various categories of words, phrases, and sentences in Darija (Moroccan Arabic).

Pydoda simplifies the process of working with the DODa dataset, allowing researchers, developers, and language enthusiasts to explore and leverage the rich linguistic content it offers. The library provides an intuitive and efficient interface to access different categories within the dataset, retrieve spellings, translations, and perform various analyses.

By integrating Pydoda into your Python workflow, you gain access to a wide range of functionalities to extract insights from the DODa dataset. Whether you need to analyze specific semantic or syntactic categories, retrieve translations, explore variations in spellings, or investigate linguistic patterns, Pydoda empowers you to unlock the potential of the DODa dataset in an effortless manner.

Supported Versions:

pydoda v1.0.0. Pydoda libray is currently on version 1.0.0.

Getting Started

Installation

Pydoda can be easily installed using pip, the Python package manager:

                    $ pip3 install pydoda
                

Get source code

The source code for Pydoda is available on GitHub. To clone it locally, use the following command:

                    $ git clone https://github.com/saad-out/pydoda.git --recurse-submodule
                

Usage

class Pydoda

The Pydoda class provides methods to access and work with Doda datasets.
This class allows you to retrieve and explore different categories of Doda datasets, such as semantic categories, syntactic categories, and additional extra categories.
It also provides a method to retrieve all available categories.

Methods & Attributes:

  • Pydoda.get_semantic_categories() -> list[str]

    Returns a list of available semantic categories.

  • Pydoda.get_syntactic_categories() -> list[str]

    Returns a list of available syntactic categories.

  • Pydoda.get_xtra() -> list[str]

    Returns a list of available extra categories.

  • Pydoda.get_ongoing() -> list[str]

    Returns a list of available ongoing categories.

  • Pydoda.all() -> dict[str, list[str]]

    Returns a dictionary containing all available categories.

  • Pydoda.classes() -> dict[str, list[str]]

    Returns a dictionary containing all available classes.

class Category(type: str, category: str)

The Category class represents a specific category in the Doda dataset.
It provides methods to access and retrieve information from the category, such as the number of entries, available spellings, English and Darija word translations, and Darija word variations.

Methods & Attributes:

  • Category.type -> str

    Returns the type of the category.

  • Category.category -> str

    Returns the name of the category.

  • Category.entries() -> dict[str, int]

    Returns the number of entries in the category.

  • Category.get_spellings() -> list[str]

    Returns a list of available spellings in the category.

  • Category.get_english_words() -> list

    Returns a list of available English words in the category.

  • Category.get_darija_words(spelling: str = 'n1') -> list

    Returns a list of available Darija words in the category.

    Args:

    • spelling: The spelling to retrieve. Defaults to 'n1'.
  • Category.get_darija_translation(word: str, spelling: str = 'n1') -> str

    Returns the Darija translation of the specified word.

    Args:

    • word: The word to retrieve.
    • spelling: The spelling to retrieve. Defaults to 'n1'.
  • Category.get_english_translation(word: str, spelling: str = 'n1') -> str

    Returns the English translation of the specified word.

    Args:

    • word: The word to retrieve.
    • spelling: The spelling to retrieve. Defaults to 'n1'.
  • Category.get_darija_variations(word: str) -> dict[str, str]

    Returns a dictionary containing the Darija variations of the specified word.

    Args:

    • word: The word to retrieve.

class CustomCategory(type: str, category: str)

The CustomCategory class represents a custom category in the Doda dataset.
It provides methods to access and retrieve information from the custom category, such as the number of entries, available columns, retrieving a specific column, retrieving a specific row, and getting a value from a row.

Methods & Attributes:

  • CustomCategory.type -> str

    Returns the type of the category.

  • CustomCategory.category -> str

    Returns the name of the category.

  • CustomCategory.entries() -> dict[str, int]

    Returns the number of entries in the category.

  • CustomCategory.get_all_columns() -> list[str]

    Returns a list of available columns in the category.

  • CustomCategory.get_column(column: str) -> list

    Returns a list of values in the specified column.

    Args:

    • column: The name of the column to retrieve.
  • CustomCategory.get_row(column: str, value: str) -> dict[str, str]

    Returns a dictionary containing the values of the specified row.

    Args:

    • column: The name of the column to search in.
    • value: The value to search for.
  • CustomCategory.get_column_for_row(column: str, value: str, return_column: str) -> str

    Returns the value of the specified column for the specified row.

    Args:

    • column: The name of the column to search in.
    • value: The value to search for.
    • return_column: The name of the column to return the value from.

class Sentence

The Sentence class represents sentences in the Doda dataset.
It provides methods to access and retrieve information from the sentences, such as the number of entries, available spellings, English and Darija word translations, and Darija word variations.

Methods & Attributes:

  • Sentence.type -> str

    Returns the type of the category.

  • Sentence.category -> str

    Returns the name of the category.

  • Sentence.entries() -> dict[str, int]

    Returns the number of entries in the category.

  • Sentence.get_english_sentences() -> list[str]

    Returns a list of all English sentences in the category.

  • Sentence.get_darija_sentences() -> list[str]

    Returns a list of all Darija sentences in the category.

  • Sentence.get_arabic_sentences() -> list[str]

    Returns a list of all arabic-written darija sentences in the category.

  • Sentence.get_darija_translation(sentence: str, language: str = 'darija') -> str

    Returns the Darija translation of the given English sentence.

    Args:

    • sentence: The English sentence to translate.
    • language: The language of the returned translation. Defaults to 'darija'.
  • Sentence.get_english_translation(sentence: str, language: str = 'darija') -> str

    Returns the English translation of the given Darija sentence.

    Args:

    • sentence: The Darija sentence to translate.
    • language: The language of the sentence. Defaults to 'darija'.
  • Sentence.get_translation_by_substring(substring: str, language: str) -> list[dict[str, str]]

    Returns a list of translations that contain the given substring.

    Args:

    • substring: The substring to search for.
    • language: The language of the substring.

More Info

For more information, contributions, and bug reports, please visit the GitHub repository.

Follow the developer on social media:

For any inquiries, please contact the developer at Gmail.