New

Research Software Developer II

Microsoft
United States, Washington, Redmond
Dec 28, 2024
OverviewThe Microsoft Translation team is on a mission to enable communication without language barriers. We offer state-of-the-art machine translation for more than 130 languages. Recently the team has introduced new features such as Document Translation, online and offline containers, as well as custom neural dictionaries. Training world class models requires world-class data. Even the best model architectures are useless if you don't have the data to train them with. As we work to expand our language coverage and quality, our needs for data quantity and quality are expanding. We are looking for a talented Research Software Developer II to join our small data team and help us to identify and collect high-quality data at large scale. The ideal candidate will have a passion for analyzing and experimenting with large-scale data, writing quality code, and a knack for developing systems that are testable, redundant and scalable. You will be working in the fields of data science, data mining, machine learning, deep neural networks and natural language processing. You will directly collaborate with experienced Machine Learning, NLP and Machine Translation scientists. This position will require work in both research and engineering domains. The ideal candidate must be comfortable both with exploring new ideas and algorithms and implementing them in a robust and scalable manner. This is a fantastic opportunity to make a real difference in the quality of our system. If you are excited about making a real difference in the quality of our system, we would love to hear from you. We do not just value differences or different perspectives. We seek them out and invite them in so we can tap into the collective power of everyone in the company. As a result, our customers are better served. Microsoft's mission is to empower every person and every organization on the planet to achieve more. As employees we come together with a growth mindset, innovate to empower others, and collaborate to realize our shared goals. Each day we build on our values of respect, integrity, and accountability to create a culture of inclusion where everyone can thrive at work and beyond. ResponsibilitiesDiscover new data sources and evaluate their quality to enrich our data coverage Work with researchers to create and evaluate prototypes for new algorithms Explore and evaluate new data processing tools and algorithms (for example text extraction, sentence extraction, parallel data alignment, normalization, duplicate identification) to improve the existing data processing pipelines. Productize research prototypes into end-to-end pipelines Maintain the existing pipelines for: Automated discovery and identification of language data from multiple domains at web scale Text data processing (text extraction, sentence extraction, parallel data alignment, normalization, duplicate identification) Large scale text data storage infrastructure (import, export, query) Data cleaning and filtering Embody our culture and values