Jobs

Data Engineer


Job details
  • Oh
  • 6 days ago

About Us

 

We're building the future of uncensored AI infrastructure & products. Our technology powers hyper-immersive experiences and enables the ownership of personalized, interoperable AI characters, unlocking vast monetization opportunities across our ecosystem and beyond.

 

We are initially focused on the Creator and Social-Fi landscapes, building interoperable 'superModel' characters powered by our advanced proprietary multi-modal, uncensored AI models. These superModels can be first experienced on our platform, OhChat, with additional platform integrations in the works.


OhChat, has gained 70,000 users across 174 countries in a matter of weeks. The site allows users to enjoy hyper-immersive experiences with digital AI characters, enabling real-time interactions and uncensored exchanges with original characters as well as ‘digital twins’ who are based on both celebrities and real-world creators, launched in partnership with them.


Website: https://chat.oh.xyz/


Job Overview


As a Data Engineer at Oh, you will play a crucial role in building and optimizing our data pipeline and infrastructure. You’ll be responsible for data collection, particularly large-scale image scraping, and managing structured and unstructured datasets for training generative AI models. You will work closely with machine learning engineers and developers to ensure data quality, availability, and scalability.


Key Responsibilities


  • Data Pipeline Development: Design, build, and maintain data pipelines to support the collection, ingestion, and processing of large-scale image, video, and audio datasets.
  • Data Scraping and Collection: Develop and optimize web scraping scripts to collect high-quality multimedia datasets
  • Data Storage and Management: Implement efficient storage solutions for large volumes of structured and unstructured data, ensuring data accessibility and scalability.
  • ETL Processes: Develop and manage ETL processes to transform raw data into formats suitable for model training.
  • Data Quality Assurance: Ensure data quality and consistency across different sources. Implement monitoring tools and workflows to maintain data accuracy and relevance.
  • Documentation: Maintain clear documentation of data sources, scraping processes, and pipeline workflows for team reference and reproducibility.


Required Skills & Qualifications


  • Programming Languages: Proficiency in either Python or JavaScript for data scraping, ETL, and pipeline development.
  • Web Scraping: Experience with web scraping tools and libraries (e.g., BeautifulSoup, Scrapy).
  • Data Storage and Processing: Experience with databases (SQL and NoSQL, such as PostgreSQL, MongoDB) and cloud storage (e.g., AWS S3, RedShift).
  • Data Pipeline and Workflow Orchestration: Familiarity with data pipeline tools such as Apache Airflow, Prefect, or Luigi.
  • Data Transformation: Strong knowledge of data transformation and processing techniques (e.g., Pandas, Dask for Python).
  • Data Quality Control: Experience with data quality monitoring tools (e.g. dbt, Great Expectations).
  • Version Control: Proficient in using Git for version control, as well as data versioning tools (e.g., DVC)
  • Pipeline Monitoring: Strong experience implementing and owning pipeline monitoring stacks (e.g., Sentry, Grafana, AWS CloudWatch)
  • Testing and code quality: Extensive experience with common frameworks for unit, behavioural, integration, and end-to-end testing (e.g., Pytest, Behave, Postman) and general code quality tools and principles (e.g., Ruff, MyPy, Bandit, Black).


Preferred Qualifications


  • Experience in Generative AI Data Collection: Understanding of the types of data needed for training generative AI models (e.g., GANs, LLMs, diffusion models).
  • Knowledge of ML/DL Basics: Familiarity with machine learning concepts, particularly around data needs for training and evaluation in the context of generative models.
  • Familiarity with Blockchain: Though not mandatory, a keen interest in the blockchain ecosystem and data sources is an advantage.
  • Data Governance: Understanding of legal and ethical implications of data collection, including copyright and privacy concerns.
  • Experience with Image and Video Processing: Familiarity with libraries for image processing (e.g., OpenCV, PIL) and video data handling is a plus.
  • Big Data Experience: Familiarity with big data tools and frameworks (e.g., Spark, Hadoop) is a plus.
  • DevOps:Some experience with common DevOps tools (e.g. CI/CD pipelines, Terraform/CDK, Docker) and best practices are a bonus.


As part of our team, you’ll enjoy:


  • The hustle of a startup with the impact of a global business
  • Tremendous opportunity to join a business pioneering the future of AI
  • Working with an extraordinary team of smart, creative, fun and highly motivated people
  • Flexible working hours, including remote working
  • Modern, uplifting work environment
  • Pension scheme
  • Generous starting salary

 

Sign up for our newsletter

The latest news, articles, and resources, sent to your inbox weekly.

Similar Jobs

Data Engineer {Music Tech Start-up

Data Scientist/Data Engineer {Music Tech Start-up}Cape TownR60,000 - R100,000 p/m + Company BenefitsAre you a Data crunching Data Scientist/Engineer looking to join a super disruptive music tech start-up who are just about to scale up?Do you want to work in a company who have a solid backing and a product...

Cape Town

Data Engineer - London / Hybrid - £75,000

My client is based in the London area are currently looking to recruit for an experienced Data Engineer to join their team. They are one of the leaders within the Legal Industry, and are currently going through a period of growth and are looking for an experienced Data Engineer to...

City of London

Data Engineer

Are you a Data Engineer who loves to experiment and brings empathy and creativity to your work?Then you’ll fit right in at this people-first, agile tech company. Where collaboration, creativity and a shared commitment to a continuous improvement drives it’s success.Guided by core values – Empathise, Enable, Empower, and Experiment...

Bishopsgate

Data Engineer - Azure - Altrincham - £70k

Data Engineer - Azure - Hybrid - Manchester - £70kThis is a fantastic opportunity for a data engineer to join an expanding IT consultancy based in Manchester! If you are an expert in the MS Azure tech stack, experienced overseeing a team of data professionals and building ETL pipelines, this...

Wilmslow

Data Engineer

Contract Data Engineer (6 Months) - Azure Azure, Azure Functions, ADF, Azure databricks, PowerBi  We are seeking an experienced Data Engineer for a 6-month contract to design, develop, and deploy scalable, secure Azure solutions. This role offers the opportunity to work on cutting-edge cloud technologies and contribute to key business...

Northampton

Data Engineer

Data EngineerPermanentLocation: Home / Brighton (on site 6 days a month)Salary: £50,000 - £60,000 (+ excellent benefits including 28% employer pension contributions, + bonus)Skills: Cloud Data Platforms (Azure preferred), SQL, Azure Data Factory, AgileWe are looking to recruit a Data Engineer for a technology driven public sector organisation based in...

Brighton